Hacker News new | ask | show | jobs
by bccdee 411 days ago
Because, unlike humans, LLMs reliably reproduce exact excerpts from their training data. It's very easy to get image generation models to spit out screenshots from movies.
1 comments

That doesn't mean that all of the output from an LLM trained on GPL code is a derivative work (and therefore GPL'd too).
A model that provably engages in systematic, difficult-to-detect plagiarism must itself be considered plagiaristic.