| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by otabdeveloper4 43 days ago

> higher param count models will remain smarter for a looong time

They're not smarter, they just know more stuff.

You probably don't need knowledge about Pokemon or the Diamond Sutra in your enterprise coding LLM.

The "smarts" comes from post-training, especially around tool use.

2 comments

CamperBob2 42 days ago

You probably don't need knowledge about Pokemon or the Diamond Sutra in your enterprise coding LLM.

That's one of the biggest remaining head-scratchers in this whole business. You do need all that unrelated stuff to make a good coding model.

Nobody knows why you can't build a coding model by training on nothing but code, CS texts, specifications, and case studies, but so far it appears that you can't.

link

otabdeveloper4 42 days ago

This one is kind of obvious - because people prompt coding LLMs with natural language. That's unrelated to stuffing the pre-train set with trivia factoids.

An LLM that knows English very well isn't actually very large and certainly not hundreds of billions of parameters.

link

anon7725 43 days ago

If the smarts came from post-training, we could show significant gains by doing that post-training again for previous generations of models. But we know that isn’t happening - effective post training is necessary but not sufficient for model performance.

link

otabdeveloper4 42 days ago

> we could show significant gains by doing that post-training again for previous generations of models

That's what Chinese models are doing, and beating Opus et al.

link