Hacker News new | ask | show | jobs
by bcjdjsndon 1 day ago
Problem is there's a lot more than a single repo in training data, the corpus is massive... Should the author of a blog post on cats also be compensated for simply being in the same training data as the git repo?
1 comments

Honestly? Yes. This is why its such a problem that most of the training data was not used with permission, and without the correct copyright status or license associated with it

There's a lot of arguments about humans doing the same thing, but the reality is that humans and robots don't enjoy the same legal protection. Its clearly a derivative work of all of its training data

> Honestly? Yes.

Then it works both ways. Say I manage to generate essentially a ripoff of your copyrighted song, release it and make a ton of money, you now have to split that royalty with keyboard cat. And Joe bloggs. You'd end up fractions of pennies