Hacker News new | ask | show | jobs
by nextaccountic 1034 days ago
Any form of lossy compression is an irreversible transformation. We do it all the time for video, audio and images (you can't recover the original data) and they are still copyrighted
1 comments

when you compress a video, it doesn't recreate a new movie with a different story, different lines of text, different scenes and a different compositions for scenes that are similar to the "orginial".
But what is being compressed is the entire corpus of text. It's compressed into model weights. It's the weights that might be under copyright of the authors of the texts that trained it.

The weights are also executable code (in some sense). When you query an LLM you're running this program with a given input. Yeah when it runs it tells a whole lot of things (sometimes novel combinations, sometimes verbatim repetition of trained data) but the point here isn't whether the output of the LLM is copyrighted; it's the weights.