| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ForceBru 26 days ago
	Right, isn't double descent one of the reasons why modern Extremely Large Language Models work at all? I think I heard somewhere that basically all today's "smart" (reasoning, solving math problems, etc) LLMs are trained in the "double descent" territory (whatever this means, I'm not entirely sure).

2 comments

mxwsn 26 days ago

No, there are more training tokens than parameters in LLMs. They are in the classical first descent setting.

link

SiempreViernes 26 days ago

No, double descent is a symptom of whatever it is that makes the deep models work at all. It's just the name for something you see happen when it works. The reason it works has something to do with how all those extra dimensions work as a regularisation term in the fit.

link