| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Dylan16807 54 days ago

> Yes LLMs can reproduce passages from copyrighted works verbatim but that's only because it "learned" it and it's just telling you what it "knows".

Are you finding people that actually say this?

When it can quote something like that, it's a training error. A popular enough work gets quoted and copied by people online, and then it's not properly deduplicated. It's a very small fraction of works it can do that with, and the cleaner your data the less it happens.

I'll once again quote that stable diffusion launched with fewer weights than training images. It had some accidental memorizations, but there wasn't room for its core functionality to be memorization-based.