Hacker News new | ask | show | jobs
by 99failures 1256 days ago
So really it boils down to attribution, therefore if GitHub were to disclose attribution to all the copyright owners of the code used to train the model, then this issue will be mute.

It will be a long list, but just a list.

2 comments

It would be very difficult to track what record in the training set contributed to what weight adjustment, especially after all the tokenization that is done.

s/mute/moot/; moot: having little or no practical relevance, typically because the subject is too uncertain to allow a decision.

And there is a list right?

https://github.com/search?q=license%3Acc0-1.0&type=Repositor...

All repos with a particular license.