| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jephs 10 days ago
	Scaling curves don't need to be drawn at particularly enormous parameter counts to be useful! If you can do a 300M and 1.2B run (like the authors do here), then you can do 150M, 300M, 600M, and 1.2B runs with only 50% more resources, and get a much better sense for whether effects seem to amplify or diminish as scale increases.

1 comments

spindump8930 10 days ago

Exactly. Good peer reviewers understand that you can also move down on the scaling curve, not just up. Also laughable to try a "yolo" run without validating a scaling ladder/curve.

link