| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by newfocogi 480 days ago
	I think this is the correct take. There are other axes to scale on AND I expect we'll see smaller and smaller models approach this level of pre-trained performance. But I believe massive pre-training gains have hit clearly diminished returns (until I see evidence otherwise).