Hacker News new | ask | show | jobs
by api 1058 days ago
It’s a form of lossy compression. Can I strip the copyright off an image by JPEG compressing it?

At the very least I think LLMs trained on data that the trainer does not own or have rights to use in that manner should not be copyrightable.

2 comments

All knowledge is lossy compression.

My thinking “the enemy gate is down” when considering the tokens “Ender’s Game” is my recalling a learned association of those tokens to the given token string.

My knowing that doesn’t strip the copyright. My telling someone the meaning and context of the phrase generally doesn’t strip the copyright away from Orson Scott Card. I’m not reproducing his work but my knowledge of it. And it’s dependent on what I do with that knowledge and how if I’ve violated his copyright.

We are prosecuting the LLMs for possessing fragments of knowledge. And we’re assuming that the recall of some of those fragments means a copy of that work is in fact contained within the weights.

An LLM is a lossy compression of the internet and I think it should be treated as such. You can't copyright the internet itself.