Y
Hacker News
new
|
ask
|
show
|
jobs
by
ricardobeat
210 days ago
It’s quite unlikely that training data will include duplicate repositories or even forks, that alone would surpass the published dataset sizes.