To build AI chatbot Claude, Anthropic "destructively scanned" millions of copyrighted books, wrote a judge on Monday.Ruling in a closely-watched AI copyright case, Judge William Alsup of the Northern District of California analyzed how Anthropic sourced data for model training purposes, including from digital and physical books.Companies like Anthropic require vast amounts of input to develop their large language models, so they've tapped sources from social media posts to videos to books. Authors, artists, publishers, and other groups contend that the use of their work for training amounts to theft.Alsup detailed Anthropic's training process with books: The OpenAI rival spent "many millions of dollars" buying used print books, which the company or its vendors then stripped of their bindings, cut the pages, and scanned into digital files.Alsup wrote that millions of original books were then discarded, and the digital versions stored in an internal "research library."The judge also wrote that Anthropic, which is backed by Amazon and Alphabet, downloaded more than 7 million pirated books to train Claude. Alsup wrote that Anthropic's cofounder, Ben Mann, downloaded "at least 5 million copies of books from Library Genesis" in 2021 — fully aware that the material was pirated. A year later, the company "downloaded at least 2 million copies of books from the Pirate Library Mirror" also knowing they were pirated.Alsup wrote that Anthropic preferred to "steal" books to "avoid 'legal/practice/business slog,' as cofounder and CEO Dario Amodei put it."Last year, a trio of authors sued Anthropic in a class-action lawsuit, saying that the company used pirated versions of their books without permission or compensation to train its large language models.Judge says training Claude on books was fair use, but piracy wasn'tAlsup ruled that Anthropic's use of copyrighted books to train its AI models was "exceedingly transformative" and qualified as fair use, a legal doctrine that allows...
First seen: 2025-07-07 11:26
Last seen: 2025-07-07 22:28