| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bobbylarrybobby 310 days ago
	My guess is that at LLM scale, you really can't try to hyperparameter tune — it's just too expensive. You probably have to do some basic testing of different architectures, settle on one, and then figure out how to make best use of it (data and RL).