| HN Mirror

But what is being compressed is the entire corpus of text. It's compressed into model weights. It's the weights that might be under copyright of the authors of the texts that trained it.

The weights are also executable code (in some sense). When you query an LLM you're running this program with a given input. Yeah when it runs it tells a whole lot of things (sometimes novel combinations, sometimes verbatim repetition of trained data) but the point here isn't whether the output of the LLM is copyrighted; it's the weights.