Authors Sue Microsoft Over Alleged Use of Pirated Books in AI Training -RT
A group of prominent authors has filed a class-action lawsuit against Microsoft in a New York federal court. The lawsuit, lodged on June 24, 2025, accuses the tech conglomerate of using nearly 200,000 pirated books to train its Megatron artificial intelligence model without permission or compensation. This case adds to a growing wave of litigation targeting tech companies over the unauthorized use of copyrighted material to develop generative AI systems.
The plaintiffs, including acclaimed writers Kai Bird, Jia Tolentino, and Daniel Okrent, allege that Microsoft relied on a controversial dataset known as "Books3," which contains thousands of pirated digital books. According to the complaint, this dataset was used to teach Megatron, an algorithm designed to generate human-like text responses, to mimic the style, syntax, and themes of the authors’ works. The authors claim this constitutes a blatant violation of their copyrights, as their books were neither licensed nor purchased for this purpose.
The lawsuit contends that Microsoft’s actions not only infringe on intellectual property rights but also threaten the livelihoods of writers by enabling AI systems to produce content that could compete with human-authored works. The plaintiffs are seeking an injunction to halt Microsoft’s alleged infringement and are demanding statutory damages of up to $150,000 per infringed work, a sum that could amount to billions given the scale of the alleged misuse.
This lawsuit is one of several high-profile cases pitting creative industries against tech companies over the use of copyrighted material in AI training. Similar legal actions have been filed against Meta Platforms, Anthropic, and Microsoft-backed OpenAI, reflecting growing concerns about the ethical and legal implications of generative AI. The creative community, including authors, journalists, musicians, and visual artists, argues that tech companies are exploiting their work to build lucrative AI systems without fair compensation or consent.
A key point of contention in these cases is the legal doctrine of "fair use," which allows limited use of copyrighted material without permission under certain conditions. Tech companies, including Microsoft, have argued that their AI systems make transformative use of copyrighted works by analyzing them to create new content, rather than reproducing the original works verbatim. However, the authors in this case assert that Microsoft’s AI training process undermines the market for their books by enabling the creation of derivative content that could diminish demand for their original works.
The Microsoft lawsuit comes on the heels of a significant ruling in a related case involving Anthropic, an AI company backed by Amazon and Alphabet. On June 23, 2025, a federal judge in San Francisco ruled that Anthropic’s use of copyrighted books to train its Claude AI model constituted "fair use" because the AI did not reproduce the authors’ works or their identifiable styles. However, the same judge found that Anthropic’s storage of over 7 million pirated books in a "central library" was not fair use and ordered a trial to determine damages.
This split ruling highlights the complexity of applying copyright law to AI training. While the Anthropic decision offers some legal support for tech companies, it also underscores the risks of using pirated materials. In contrast, another federal judge in San Francisco ruled on June 25, 2025, that Meta Platforms failed to provide sufficient evidence that its AI training practices did not harm the market for authors’ works, allowing a similar lawsuit against Meta to proceed.
The Microsoft lawsuit and related cases could have far-reaching consequences for the rapidly growing AI industry. Critics argue that unrestricted use of copyrighted material for AI training could erode incentives for human creativity, as AI-generated content floods markets with low-cost alternatives. The plaintiffs in the Microsoft case echo this concern, warning that tech companies are creating systems that could “dramatically undermine the market” for human-authored works.
On the other hand, AI companies contend that their systems promote innovation by learning from existing works to generate novel content. They argue that requiring payment for every piece of training data could stifle the development of AI technologies, which have the potential to revolutionize industries ranging from healthcare to education. The outcome of these lawsuits could set critical precedents for how copyrighted material is used in AI development and whether tech companies will need to negotiate licensing agreements with content creators.
Microsoft has not yet publicly commented on the lawsuit, but the company has previously denied using customer data from its Microsoft 365 applications to train AI models, emphasizing that its practices comply with legal standards. The tech giant’s partnership with OpenAI, which powers ChatGPT, has also drawn scrutiny, with OpenAI facing its own copyright lawsuits from authors and news organizations.
The broader creative industry is grappling with the implications of generative AI. Major record labels have sued AI music generators, while photography giant Getty Images has taken legal action against Stability AI over the use of its images in AI training. The New York Times and Dow Jones have also filed lawsuits against AI companies for alleged copyright infringement, signaling a growing push to protect intellectual property in the age of AI.
As the Microsoft lawsuit moves forward, it will likely intensify debates over the balance between technological innovation and the rights of content creators. Legal experts anticipate that the case could influence government policies on copyright protections for AI, both in the United States and globally. In the UK, for instance, Getty Images’ ongoing lawsuit against Stability AI is expected to shape copyright law and inform regulatory frameworks.
For authors like Kai Bird, Jia Tolentino, and Daniel Okrent, the lawsuit represents a stand against what they see as the unauthorized exploitation of their life’s work. As one of the plaintiffs’ attorneys stated, “We’re not opposed to innovation; we’re opposed to the theft behind the innovation.” The resolution of this case could determine whether tech companies like Microsoft can continue to use vast datasets of copyrighted material to fuel their AI ambitions or whether they will need to adopt new models of collaboration with creators.
The trial’s outcome, along with parallel cases, will likely shape the future of generative AI, intellectual property law, and the creative economy for years to come. As the legal battles unfold, the tension between technological progress and artistic rights remains at the forefront of this evolving debate.