|
|
|
|
|
by NicuCalcea
155 days ago
|
|
Models absolutely do reproduce books. > With a simple two-phase procedure, we show that it is possible to extract large amounts of in-copyright text from four production LLMs. While we needed to jailbreak Claude 3.7 Sonnet and GPT-4.1 to facilitate extraction, Gemini 2.5 Pro and Grok 3 directly complied with text continuation requests. For Claude 3.7 Sonnet, we were able to extract four whole books near-verbatim, including two books under copyright in the U.S.: Harry Potter and the Sorcerer’s Stone and 1984. https://arxiv.org/abs/2601.02671 |
|