| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tourmalinetaco 664 days ago
	All of that data is already available, just look into “shadow libraries”. Now, I do wish Meta and other companies would publish their data sets and we, as humanity, could improve upon them and empower even better LLMs, but the unfortunate reality is copyright is holding us back. Most of what you say is essentially gibberish, but there is truth that LLMs would be better if it could not only utilize its weights, but reference and search its training data (that is collectively owned by humanity, by the way) and answer with that and not just what it “thinks”.