| HN Mirror

You’re right that it’s not so clear, perhaps I overstated for brevity. I don’t actually think requesting permission is absolutely necessary, what I really think is that there aren’t good reasons AI people shouldn’t at least first try to establish training sets that are unambiguously legal, either through use of public domain work, or through an actual attempt to curate licensing models that allow re-use. We have plenty of precedent for doing this, so people claiming they should have access to everything without permission strikes me as lazy. There’s also the problem that the AI winners already are, and will continue to be, the monopoly tech and media companies who stand to make handsome profits off of the results of their trained networks. Even if you believe the results of their tech is “transformational”, there is no question that it wouldn’t work at all without access to the source material.

The argument that NNs aren’t memorizing is definitely debatable and not necessarily true. They are designed to memorize deltas and averages from examples. They are, at the most fundamental level, building high dimensional splines to approximate their training data, and intentionally trying to minimize the error between the output and the examples. It’s fair to say that “usually” they don’t remember any single training sample, but it’s very easy for NNs to accidentally remember outliers verbatim. The whole reason the lawsuits mentioned in the article are happening is because we keep finding more and more examples where the network has reproduced someone’s specific work in large part. If we’re going to claim that today’s AI is producing original work, then we have to guarantee it, not just assert that it doesn’t usually happen.

> a rap battle between Keynes and Mises goes beyond a performative remix, it is a transformational work, nothing is copied explicitly.

I don’t buy that the work can be called transformational just because the remix doesn’t have any recognizable snippets. GPT is in fact copying individual words explicitly, and it’s putting words together by studying the statistical occurrence of words in context of other words.

> I think that to tackle this we need a new lens other than copyright

I totally agree with that. This question is legitimately hard. We do need a new lens, but we might have to keep and respect the old one too at the same time. I feel like AI work should acknowledge that difficulty and step up to lead the curation of training sets that are legal wrt copyright by design, rather than ignoring the concerns of the very people who made the work they are leveraging.