| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by solenoid0937 116 days ago
	They have not, every successful pre-train as of late has had performance increases greater than what the scaling laws predict.

1 comments

Those gains are arch based, data quality based, etc. Scaling laws only relate to data volume and compute, holding other factors constant.