| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by reubenmorais 1147 days ago
	The number of data points is tiny. There's only a handful of LLMs trained from scratch in the world, and sizes of models released in a "generation" tend to be close to each other somewhat. The field is very open source so people all over are building on top of the same shared literature. Plus I'm sure there are leaks very often and companies then rush to train their own pet architecture to whatever parameter size the competition is about to release.