Hacker News new | ask | show | jobs
by xyzal 108 days ago
Are training data counted as input?

It would be interesting to see a court ruling that the output of LLMs trained on copyleft code are licensed under the GPL ... and all other viral licenses simultaneously

2 comments

> Are training data counted as input?

It is quantum legality, to use copyright input is legal or illegal depending on the observer.

Schrodinger's Chat
Unless your llm works by quoting large parts of copyrighted works, reinterpretations of them aren't copyrighted. Because it's not a copy.
What if the output regurgitates some other legal entity’s boilerplate licence agreement? Is the output automatically licensed to that entity?
No, the copyright is the colour of the bits, and red bits with a comment saying "these bits are blue" are not blue bits, but you may be prosecuted for fraud.
It's wild to me that there haven't been more court cases to answer questions like those being asked in this thread.

No one knows.

It's new, fast-moving technology, and the courts are slow and expensive.

It would take two stubborn businesses with a lot of money deciding that it is better to battle it out than focus on their business. Something like IBM v SCO or Oracle v Google.