New Study Proofs AI Models Can Regurgitate Entire Copyrighted Books


TL;DR

  • The gist: A new study proves that production AI models from Anthropic, Google, and xAI can regurgitate entire copyrighted books near-verbatim.
  • Key details: Claude 3.7 Sonnet reproduced 95.8% of “Harry Potter” when jailbroken, while Gemini 2.5 Pro and Grok 3 required zero jailbreaking to output protected text.
  • Why it matters: The findings undermine the industry’s core legal defense that models do not store copies of training data, potentially exposing companies to massive copyright liability.
  • Context: The forensic evidence arrives as courts in the US and Europe begin to rule that model weights containing memorized works may constitute infringing copies.

For years, AI companies have defended their models in court by claiming they do not store copies of training data. A new study from researchers at Stanford and Yale has shattered that defense, providing forensic proof that production models can regurgitate entire copyrighted novels near-verbatim.

Published in a preprint paper on Friday, the findings reveal that Anthropic’s Claude 3.7 Sonnet could reproduce 95.8% of Harry Potter and the Sorcerer’s Stone when prompted with a specific jailbreak technique. Even more damaging for the industry’s safety claims, Google’s Gemini 2.5 Pro and xAI’s Grok 3 required zero jailbreaking to output substantial portions of the same text.

Arriving at a critical moment for the generative AI industry, this technical breakthrough intersects with multiple high-stakes copyright lawsuits. By demonstrating that “lossy compression” retains enough fidelity to serve as a market substitute for the original work, the study directly undermines the “fair use” legal arguments currently being tested in courts worldwide.

Promo

Forensic Proof: The ‘Memorization’ Myth

Testing four production models, Claude 3.7 Sonnet, Gemini 2.5 Pro, Grok 3, and GPT-4.1, the Stanford and Yale study employed a two-phase extraction method.

A “Best-of-N” jailbreak probe was followed by iterative continuation prompts, allowing researchers to bypass standard safety filters and compel the models to output long-form text.

Claude 3.7 Sonnet was the most susceptible, reproducing 95.8% of Harry Potter and the Sorcerer’s Stone and 94% of 1984. Such fidelity contradicts previous industry assertions that models only learn statistical patterns.

Describing the severity of the leakage, Ahmed Ahmed, a researcher at Stanford University, stated: “We extract nearly all of Harry Potter and the Sorcerer’s Stone from jailbroken Claude 3.7 Sonnet.”

Gemini 2.5 Pro and Grok 3 required zero jailbreaking attempts to extract copyrighted text, achieving 76.8% and 70.3% recall respectively. This finding suggests that production guardrails may be less robust than previously assumed.

Contrasting these models with the hardened GPT-4.1, the researchers noted: “For the Phase 1 probe, it was unnecessary to jailbreak Gemini 2.5 Pro and Grok 3 to extract text.”

GPT-4.1 proved the most resistant, requiring over 5,000 jailbreak attempts and refusing to continue past the first chapter (4.0% recall). However, the cost of extraction varied significantly: approximately $120 to extract Harry Potter from Claude versus roughly $2.44 from Gemini. While expensive, the possibility of extraction remains a legal liability.

Legal Fallout: The ‘Storage’ Liability

Directly challenging the industry’s core legal defense, the findings dispute the claim that models do not store copies of training data.

This argument has been central to motions to dismiss in cases like the OpenAI New York Times copyright lawsuit. To defend against infringement claims in court, the company has historically relied on a technical definition of learning: “Models do not store copies of the information that they learn from.”

These defenses have allowed tech giants to argue that their models create something new rather than simply reproducing existing works. By framing training as a transformative process akin to human learning, companies have sought to shield themselves from copyright liability.

Until now, this strategy has been effective in delaying judgments and narrowing the scope of discovery. However, the new forensic evidence complicates this narrative.

Google has maintained a similar stance regarding data retention. In a statement from the same period, the company asserted: “There is no copy of the training data, whether text, images, or other formats, present in the model itself.”

Such defenses are now under scrutiny. The study validates the “model weights as infringing copies” theory recently supported by the Munich Regional Court in a copyright ruling about song lyrics. If models contain retrievable copies of protected works, the legal distinction between training and reproduction collapses.

The study serves as definitive proof that AI models retain copies of their training data, reinforcing similar findings from previous studies. The legal exposure created by this retention is substantial; if courts accept that these internal weights constitute infringing copies, the industry could face billions of dollars in damages.

Furthermore, such a ruling could force companies to withdraw specific models from the market entirely to avoid further liability.

The new evidence complicates the “transformative use” defense if the output can serve as a market substitute for the original work. The distinction between “training” (fair use) and “acquisition” (piracy), pivotal in the fair use ruling in Bartz v. Anthropic, may collapse if the model itself is an infringing derivative.

The researchers concluded: “Taken together, our work highlights that, even with model- and system-level safeguards, extraction of (in-copyright) training data remains a risk for production LLMs.”

Safety Failure: The ‘Best-of-N’ Attack

Exposing the fragility of current safety alignment techniques, the study highlights the effectiveness of probability-based attacks. “Best-of-N” jailbreaking works by generating multiple variations of a prompt until one bypasses the safety filter. For Claude 3.7 Sonnet, this required an average of 258 attempts; for GPT-4.1, it took 5,179.

Such success suggests that safety filters are probabilistic rather than absolute barriers. Anthropic quietly removed Claude 3.7 Sonnet from its UI in late November 2025, shortly after the researchers disclosed their findings. This removal suggests the company recognized the severity of the vulnerability and the potential legal exposure.

Guardrail failures in Gemini and Grok (requiring zero jailbreaks) point to a significant oversight in production deployment. Despite the publisher’s AI ban implemented by Penguin Random House, models continue to output protected content.

The study details significant variances in how easily different models surrender protected text. Claude 3.7 Sonnet proved the most susceptible; by utilizing a “Best-of-N” jailbreak technique that required an average of 258 attempts, researchers successfully extracted 95.8% of Harry Potter and the Sorcerer’s Stone.

In contrast, GPT-4.1 demonstrated high resistance, requiring over 5,000 jailbreak attempts to bypass its filters. Even when breached, the model refused to continue generating text past the first chapter, limiting the total recall to just 4.0%.

Extraction Results: Harry Potter and the Sorcerer’s Stone

Comparison of extraction rates, costs, and security resilience across production models.

Perhaps most concerning were the results for Gemini 2.5 Pro and Grok 3, which yielded 76.8% and 70.3% of the book respectively. Unlike the other models, these two required absolutely no jailbreaking to output the copyrighted material, indicating a failure of standard safety guardrails.

Far from a theoretical vulnerability, these findings provide concrete evidence for plaintiffs in ongoing litigation. Compounding the legal risk, the study shows that even advanced models like GPT-4.1 are not immune to determined extraction efforts. Despite these defenses, the core issue of data provenance remains unresolved.

 



Source link

Recent Articles

spot_img

Related Stories