| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Scene_Cast2 310 days ago
	I find it interesting that the architectures of modern open weight LLMs are so similar, and that most innovation seems to be happening on the training (data, RL) front. This is contrary to what I've seen in a large ML shop, where architectural tuning was king.

2 comments

bobbylarrybobby 310 days ago

My guess is that at LLM scale, you really can't try to hyperparameter tune — it's just too expensive. You probably have to do some basic testing of different architectures, settle on one, and then figure out how to make best use of it (data and RL).

link

ModelForge 310 days ago

Good point. LLMs lower the barrier to entry if someone has enough resources because those architectures are more robust to tweaks given one throws enough compute and data at them. You can even violate scaling laws and still get a good model (like Llama 3 showed back then)

link