Hacker News new | ask | show | jobs
by lporto 934 days ago
>we found 72,508 ebook titles (including 83 from Stanford University Press) that were pirated and then widely used to train LLMs despite the protections of copyright law

https://aicopyright.substack.com/p/the-books-used-to-train-l...