Hacker News new | ask | show | jobs
by bobbylarrybobby 310 days ago
My guess is that at LLM scale, you really can't try to hyperparameter tune — it's just too expensive. You probably have to do some basic testing of different architectures, settle on one, and then figure out how to make best use of it (data and RL).