Hacker News new | ask | show | jobs
by Ukv 919 days ago
> There’s no question these neural networks and their output are derivative works.

In the two US cases we have any progress on so far, the established requirement for substantial similarity (opposed to "dependant on" or such) has been upheld, with Judge Vince Chhabria specifically setting out that it'd "have to mean that if you put the Llama language model next to Sarah Silverman's book, you would say they're similar". and Judge William H. Orrick agreeing with the defendants that "plaintiffs cannot plausibly allege the Output Images are substantially similar or re-present protected aspects of copyrighted Training Images, especially in light of plaintiffs’ admission that Output Images are unlikely to look like the Training Images".

The UK definition of derivative works is, to my understanding, narrower and specifically enumerated as opposed to the US's more open-ended definition.

The remaining area of doubt, assuming the above remains consistent, is over the transient copying that occurs during training.

3 comments

> the transient copying that occurs during training.

i think this should be dismissed as it is the same level of transience as the workings of the internet; you and your ISP, caching proxies etc, all made a transient copy as part of the existing (legal) consumption of the works that the author has put online.

Unless the works was illegally copied for training - which cannot be true if the works was publicly available for viewing on the internet, this transient copying cannot be a valid infringement.

Doing something a little isn’t the same a doing something a lot. You can walk into a restaurant and look at a menu for 5 minutes and then leave without issue but try to do that same thing for 8 hours.

Downloading a singe transient copy of some image once in the lifetime of a company is different than doing that same action a hundred times once for each version of the network.

This case involves a many examples of substantial similarity. Worse it’s precedent that generative AI doesn’t necessarily avoid creating such examples.

Defendants can easily argue that being 1/10 millionth or whatever of the training set means their specific work is unlikely to show up in any specific example but the underlying mechanism means it can be recreated.

The defendants will evidently claim transient copying.
I doubt these companies constantly downloading the full training set rather than keeping it in a database somewhere.

Hard to argue keeping a copy of some copyrighted work indefinitely counts as transient.

> I doubt these companies constantly downloading the full training set rather than keeping it in a database somewhere.

Precisely to argue for transient copies, they don't need to keep terabytes of data stored.

>Hard to argue keeping a copy of some copyrighted work indefinitely counts as transient.

You're assuming that they're keeping the works indefinitely, which again is not the case.

> Precisely to argue for transient copies, they don't need to keep terabytes of data stored.

Those kinds of legal workarounds rarely work.

They are dependent persistent access allowing them the equivalent benefit of keeping a persistent copy.