| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bcjdjsndon 1 day ago
	Problem is there's a lot more than a single repo in training data, the corpus is massive... Should the author of a blog post on cats also be compensated for simply being in the same training data as the git repo?

1 comments

20k 1 day ago

Honestly? Yes. This is why its such a problem that most of the training data was not used with permission, and without the correct copyright status or license associated with it

There's a lot of arguments about humans doing the same thing, but the reality is that humans and robots don't enjoy the same legal protection. Its clearly a derivative work of all of its training data

link

bcjdjsndon 17 hours ago

> Honestly? Yes.

Then it works both ways. Say I manage to generate essentially a ripoff of your copyrighted song, release it and make a ton of money, you now have to split that royalty with keyboard cat. And Joe bloggs. You'd end up fractions of pennies

link