| HN Mirror

MoE is mostly an optimization of the active parameters and therefore lowering the compute requirements, but it can provide some performance improvements over dense models in some cases.

I would not describe reasoning as optimization: In fact, it's typically doing the opposite, as models spend way more tokens (and therefore compute) on responding to the same prompt. Some of the smartest models these days use ridiculous amounts of reasoning before they ever respond. Try Deep Research in Gemini or Claude and you'll see what I'm talking about.

>> But this seems to be a very clear path to be "taking the car to the carwash by foot" for a long time, isn't it?

I thought the progress was plateauing sometime last year too, but then some new models got released and we saw that the multilingual capabilities improvements are real. And if you want something more tangible and reported on, consider the Opus 4.5/4.6 coding revolution (Claude Code explosion) a few months back.

LLMs being stochastic and statistical machines, there will always be funny things that people will come up with that will trick them, be it R's in strawberry or the carwash by foot. At the same time, I can tell you from my experience that a lot of the Misguided attention ( https://github.com/cpldcpu/MisguidedAttention ) type of stump questions work at a much lower rate with newer models. Progress is being made, it's just not in visible areas.

BTW, you can come up with many trick questions that will stump even humans with PhDs. They will be of different kind than the ones for LLMs, but this is not a flaw unique to LLMs.

If you're asking whether the progress to AGI isn't taking too long, then I personally think LLMs, at least with their current architecture, are not the foundation of AGI, and will always have inherent limitations. But we're fully in the "that's just like, your opinion, man" territory now :)