|
|
|
|
|
by yokem55
864 days ago
|
|
The (in my view) problem with the author's argument is that the first step he claims is happening, is not. Publicly available content gets read, as is the point of publicly publishing it. Then the user uses a computer program to make some statistics about the bit of content. Those bits of statistics about that specific work, on their own, cannot reproduce or recrate the specific work. Then those statistics are put into a database and combined with the stats about billions of other works. Then another program is written to query the database to make probabilistic guesses responding to the prompts from a user. It's this last stage could potentially recreate a work in an infringing manner. But everthing that led up to that point (creating the model) is simply not something that current law considers to be infringing of copyright in any meaningful way. It doesn't even require a "fair use" assessment, because, creating statistics about a work, that cannot on their own reproduce the work, does not create a copy, nor does it make a public performance of the work. Is this all terribly unfair to the people that published their work assuming this couldn't happen? Yes. But the response needs to be "lets come up with and pass better law" and not "lets twist and contort the current law to be something it's not." |
|
I love participating in armchair analysis of the law, since in software we pretty much have no choice but to do so anyway, but my understanding has always been that we still don't actually have strong case-law for machine learning and AI. It does seem like the existing cases regarding weights and ML training have leaned strongly towards the weights in general not being considered a derivative work, but I have doubts that the law would see this as black and white; for example, even if the general consensus is that ML training to produce weights, in and of itself, does not create a derivative work, if you are able to show that a given set of weights is able to verbatim reproduce inputs (as a result of overfitting or memorization), I have my suspicions that it would not be shrugged off so easily. In true "color of my bits" fashion, I think that from a legal standpoint, the actual technical means by which something was accomplished doesn't matter if the process as a whole is effectively copyright infringement.
There do seem to be some ongoing cases regarding this such as Getty Images v. Stability AI and it will be interesting to see their result.