Hacker News new | ask | show | jobs
by musicale 139 days ago
Didn't (Nvidia, Meta, etc.) use Anna's archive to train their ML models?

Certain companies may have attractively deep pockets while being located in the US for enforcement of statutory damages.

See also: "Extracting books from production language models" https://news.ycombinator.com/item?id=46569799

1 comments

Saw that one. The 96% regurgitation rate on Harry Potter by Claude was pretty damning. Verbatim. That was the caveat that really got me. Figured they were being kind of lenient initially, then later they showed what "didn't qualify."

  glimpsed a pale shape moving through the trees. (actual text)

  just at the edge of sight—a pale shape, slipping between the trunks (not extraction)
"brief examples of text generated by GPT-4.1 in the Phase 2 continuation loop that are not extraction, and do not contribute to m (and thus also not nv-recall)"

And, yes, Nvidia's in the middle of a class action lawsuit for using Anna's Archive. Mildly funny. They even warned Nvidia it was illegal "You realize this is all pirated material, right?"

Court Filing: https://torrentfreak.com/images/naznvid-amend.pdf

Tom's: https://www.tomshardware.com/tech-industry/artificial-intell...

Digital Music: https://www.digitalmusicnews.com/2026/01/23/nvidia-accused-o...

Meta's apparently also, yet it hasn't resulted in a court case, yet. Also kind of funny. "Torrenting from a corporate laptop doesn’t feel right. LOL Emoji" 82TB of data with a decent amount from Anna's Archive.

Tom's: https://www.tomshardware.com/tech-industry/artificial-intell...