Hacker News new | ask | show | jobs
by k2enemy 1228 days ago
But that's also not how copyright works. At least in the Unites States, the protections offered by copyright center around reproduction, performance, and derivative works[1].

If the AI models are reproducing copyrighted works, then that's a problem. And it does look like there are some examples where that might be happening beyond notions of fair use. But slupring up copyrighted content to train a model seems to fall under allowed use.

[1] https://www.copyright.gov/what-is-copyright/

2 comments

US law only applies to the US, and what is legal is not the same as what is moral.

As for fair use, this is not the same as someone remixing or sampling songs, or writing fan fiction or satire or quoting works for criticism.

People do not want models trained on their creative works, so that someone else can make money using those models to produce similar creative works as a service for third parties.

While it is possible to create similar creative works -and I will grant that that could be a prima facie problem- it is also possible to make rather new creative works as well. Just like you can do by hand, you can interpolate and extrapolate from known starting points, and there is nothing stopping you coming up with something totally unique.
Training a model maybe, but is it clear that the output of the model isn't a derivative work?
To about the same degree as the output of a human.

I just started writing a new novel. It's an interesting, in my opinion highly novel fantasy/SF(ish) story, for once not fanfiction of anything that's still in copyright -- most people wouldn't count stories based on ancient norse mythology as 'fanfiction' -- but that doesn't mean it isn't derivative. It means, instead of naming two or three things it's derivative of, I can name ten to fifteen.

That's normal. All stories are derivative, and if you point me at an author who claims theirs aren't, you're pointing at a liar. The job of an author is to put the building blocks together in a new and interesting form, not to make them up from whole cloth. It's impossible to invent more than two or three truly novel ideas per day, even if you're incredibly imaginative, and most of those won't be any good.

The difference between humans and AIs, nowadays, seem to be that the AIs use millions of sources instead of ten to fifteen. Or, alternately, that they use none -- and theirs is less derivative -- because certainly everything I've ever read goes into my writing, not just the things I recognise I'm using.

>To about the same degree as the output of a human.

No. Full stop. Humans aren't stochastic parrots. Pointing to a lack of understanding about what exactly happens in the human mind is, FULL STOP, not evidence that LLMs are doing the same things humans do.

This being HN, I get to be pedantic ;-).

Humans are not stochastic, they're obviously chaotic[1]. Which is to say: not parrots at all.

Some of the modern models I've seen also seem to be chaotic too though, so that's interesting [2]. I'm going to assume LLMs probably exhibit the same properties.

[1] https://en.wikipedia.org/wiki/Chaos_theory (Chaotic systems sometimes seem to be stochastic, but they're actually much stranger and more interesting!)

[2] I've been messing with stable diffusion to get a feel for (and/or avoid) tipping points: that is to say, points in latent space where the model becomes very sensitive to small changes in initial parameters. You can find instances fairly quickly even by hand by doing bisect search.

>[2]. I'm going to assume LLMs probably exhibit the same properties.

That's quite an assumption to make.

They use very similar technology, so it's not a large leap.
I don't think I ever claimed that?

That's not my argument. My argument is that the anti-AI arguments, as spoken, also match to what I know I'm doing as a human. In my opinion better than it matches to what the AIs are doing, because as you say, they aren't human.

Maybe the output isn't, but what the LLM turns the work into when it becomes a constituent element of the model is probably a derivative work.