| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ricardobeat 210 days ago
	It’s quite unlikely that training data will include duplicate repositories or even forks, that alone would surpass the published dataset sizes.