| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kybernetikos 1023 days ago

I am not a lawyer, but it seems right to me to say that the weights are a derivative work of the training set.

> A “derivative work” is a work based upon one or more preexisting works, such as a translation, musical arrangement, dramatization, fictionalization, motion picture version, sound recording, art reproduction, abridgment, condensation, or any other form in which a work may be recast, transformed, or adapted. A work consisting of editorial revisions, annotations, elaborations, or other modifications, which, as a whole, represent an original work of authorship, is a “derivative work”.

As I understand it, derivative works must be created with the legal use of the original work, or be fair use, otherwise they are infringing.

1 comments

crazygringo 1023 days ago

No, as you can see from your very definition. But here's a good example:

If you take a book and turn it into a movie, that's a derivative work. Anyone can see the direct resemblance -- the transformation or adaptation.

But if you take a book, convert each letter to a number, add up the numbers that make each sentence, and then sell that as a list of "random" numbers, that's not a derivative work. The end result is sufficiently transformed that copyright no longer applies. Ownership of the original work has no relevance.

And AI weights are like that. They're a complete transformation. They're not a derivate work. The only thing you have to make sure of is that they haven't been overtrained to the extent that they can regurgitate whole chapters of the texts they were trained on, for example. But that's not something they're currently able to do, and obviously copyright law will force companies to ensure it stays that way. (Not to mention that companies would do it anyways, due to the economic motivation of reducing model sizes to cut costs.)

link

fsckboy 1021 days ago

>convert each letter to a number, add up the numbers that make each sentence...The end result is sufficiently transformed that copyright no longer applies

the problem with this as an example is that copyright would not apply to this transformative work, not the original author's copyright nor your new authorship because this transformative work contains no creative human expression (unless the original book was designed to add up to some fortune cookie, of course, in which case you have not transformed it)

A nuttier, chewier example would be retelling a litigious story like Moana ("consider the copyright, across all these leaves... make way!"), from the pig's perspective or something, and seeing what would fly and what wouldn't.

link

kybernetikos 1023 days ago

Weights are simply a lossy compression of the training data set.

Now, I understand the argument that perhaps the specific work has been homeopathically diluted down to nothingness in the weights and so therefore has only been used to contextualise the compression process of other works, but if the weights can be reasonably used to generate copyright infringing text (and condensations and abridgements and transformations are explicitly listed in the law, verbatim copying is not necessary), or even answer substantial questions about it, then that shows that the weights included that data.

If I take a sound file and compress it down so it's poor quality but I can still make out the tune, that doesn't mean that I've avoided copyright law.

link

crazygringo 1023 days ago

> Weights are simply a lossy compression of the training data set.

No they're not -- they're more like the dictionary generated to produce a lossless compressed data set. But then we throw out the compressed data itself, and keep only the dictionary.

> but if the weights can be reasonably used to generate copyright infringing text (and condensations and abridgements and transformations are explicitly listed in the law, verbatim copying is not necessary)

First of all, they haven't been shown to substantially generate infringing text that aren't the kinds of short snippets covered by fair use. And my previous comment already explained that longer texts are not going to happen, for both legal and economic reasons.

But secondly, you're wrong about "condensations and abridgements and transformations". You can absolutely sell a page-long summary of a book without getting permission, for instance. What do you think things like CliffsNotes are all about? Or all those two-page "executive summaries" of popular busines books?

You can't abridge a 1,000 page book to 500 pages and sell that, but you can summarize its ideas in a page and sell that. Which is basically the approximate level of understanding that LLM's seem to absorb.

link