Last week, Judge William Alsup of the federal court in San Francisco delivered a highly anticipated ruling in the case between three authors Andrea Bartz, Charles Graeber, and Kirk Wallace Johnson against Anthropic, the California unicorn behind the AI Claude. This order, based on fair use, marks a turning point in the debates on the use of copyrighted works to train AI models.
Between 2021 and 2023,
Anthropic downloaded over 7 million pirated books from sites like Books3,
LibGen, or PiLiMi. After realizing the legal risks associated with pirated copies, the company legally purchased hundreds of thousands of these books starting in the spring of 2024, scanned them after removing their bindings, headers, and footers, and then destroyed them. These files were kept in its internal library even after deciding that some would not be used for training its
Claude models or would no longer be used in the future.
The novels of Bartz, the essays of Graeber, and the narratives of
Johnson are among both the pirated and legally purchased books, often second-hand. They filed this class-action lawsuit against
Anthropic for using their works without consent or financial compensation for copyright infringement.
Without resolving all the questions raised by this case, Judge Alsup clarified two essential points. On one hand, he found that Anthropic's use of legally acquired, digitized books integrated into its training base constituted fair use under American
law. The judge compared this process to that of an author or researcher relying on readings to produce original work, thus highlighting the transformative aspect of the use. For him, the authors' lawsuit
"is no different than if they complained that training schoolchildren to write well would lead to an explosion of competing works".
On the other hand, he clearly distinguished this lawful treatment from retaining digitized versions. According to him, creating an internal library from stolen books cannot be excused by the right to innovate or research. This part of the dispute is referred to a trial scheduled for December, where Anthropic's liability could be engaged for characterized copyright infringement.
The company might then face a class-action lawsuit of another scale if the judge approves the inclusion of thousands of authors in the case. If certified,
Anthropic could be required to pay each of them up to $150,000 per work...
This historic decision, if not overturned on a possible appeal, could set a precedent and influence other ongoing disputes in the AI sector.