Hacker News new | ask | show | jobs
by OneDeuxTriSeiGo 451 days ago
Why should it be fair use? Why would being a derivative work not be OK? There is a massive corpus of public domain and FOSS works. Likewise plenty of permissively licensed government created datasets. There is no reason why any corpus created from these sources is insufficient.
1 comments

> Why would being a derivative work not be OK?

That's not even the real problem. It's a problem, yes, but not the real problem. The problem is that before they could train the model on the book, they had to copy the book from somewhere. Is it ok to make illegal pirated copies of a copyrighted book to train your model? I think that's the issue we are dealing with here.

Whether it is ok to create a derivative work or not is beside the point.

The problem is that before they could train the model on the book, they had to copy the book from somewhere.

That, in itself, raises kind of an interesting point.

Right now there's a post on the front page where people are exercising conspicuous outrage because ChatGPT rendered a good Indiana Jones likeness in response to a vague query asking for a 1930s archaeologist with a bullwhip. Was that particular response generated by ChatGPT because it "copied" Indiana Jones? Or because it was influenced by the same pulp fiction stories and deeply-embedded cultural archetypes that led Spielberg and Lucas to create the character in the first place?