Hacker News new | ask | show | jobs
by boothby 529 days ago
If I understand the position of major players in this field, downloading models in bulk and training a ML model on that corpus shouldn't violate anybody's IP.
1 comments

IANAL But, this is not true it would be a piece of the software. If there is a copyright on the app itself it would extend to the model. Even models have licenses for example LLAMA is release under this license [1]

[1] https://github.com/meta-llama/llama/blob/main/LICENSE

The fact that models creators assert that they are protectrd by copyright and offer licenses does not mean:

(1) That they are actually protected by copyright in the first place, or

(2) That the particular act described does not fall into an exception to copyright like fair use, exactly as many model creators assert that the exact same act done with the materials models are trained on does, rendering the restrictions of the license offered moot for that purpose.

LLMs are trained on works -- software, graphics and text -- covered by my copyright. What's the difference?
The difference is that you pulling out a model is you potentially violating copyright, while the model itself being trained on copyrighted models is potentially them violating copyrights.

I.e. the first one concerns you, the other is none of your business.

Them potentially violating my copyrights is very much my business. But you're right, the difference is how much the respective parties have to spend on legal battles.

Simply showing up to court wearing a tshirt that says "what she said" probably wouldn't fly, but I like to imagine that any arguments made by them about their copyrights would be equally true of my copyrights.

At this point I'm mostly wondering if "you ripped me off first" is a viable legal defense to copyright battles where it's unclear if either party is distributing the works of the other. One thing is for sure though, if I were to do this as an individual, the discovery process would be much more expensive for them than me.

If I understand the position of major players in this field, copyright itself is optional (for them at least).
True, I think there has to be a case that sets precedent for this issue.
They claim “safe harbour” - if nobody complains it’s fair game
Is there a material difference between the copyright laws for software and the copyright laws for images and text?
Yeah no.

An example for legal reference might be convolution reverb. Basically it's a way to record what a fancy reverb machines does (using copyrighted complex math algorithms) and cheaply recreate the reverb on my computer. It seems like companies can do this as long as they distribute protected reverbs separately from the commercial application. So Liquidsonics (https://www.liquidsonics.com/software/) sells reverb software but puts for free download the 'protected' convolution reverbs specifically the Bricasti ones in dispute (https://www.liquidsonics.com/fusion-ir/reverberate-3/)

Also, while a SQL server can be copyright protected, a SQL database is not given copyright protection/ownership to the SQL server software creators by extension of that.