| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by noboostforyou 480 days ago
	For that argument I believe the question becomes "is the output of a model considered a derivative work of the training data?" https://www.copyright.gov/circs/circ14.pdf

1 comments

ninalanyon 480 days ago

What else could it be?

link

Ajedi32 479 days ago

An original composition based on a statistical analysis of the training data. Statistical data about a copyrighted work obviously isn't necessarily a derivative of that work. Otherwise Tolkien could sue me for telling you how many times The Lord of the Rings uses the word "the".

link

rasz 479 days ago

Can it reproduce training data? Then its not analysis but compression, lossy compression.

link

Ajedi32 477 days ago

For most LLMs, with most works, no.

If you trained an LLM repeatedly on nothing but the text of LOTR until it could re-produce the books verbatim and then tried to sell copies of that LLM, then I agree that would be blatent copyright infringement, yes.

link

monocasa 479 days ago

The industry is banking on Author's Guild v. Google to be precedent in such a way that it's functionally transformative enough to be a completely new work.

https://en.wikipedia.org/wiki/Authors_Guild,_Inc._v._Google,....

I think they have about a coin flip of a chance that it passes muster in the courts.

link