Another take is that the base models are now good enough that spending more money for more intelligence is viable at test time. A threshold has been crossed.
Naively, I feel to be useful, the goal of LLMs should be to more power efficient. So that eventually all devices can be smarter.
Power efficiency can be gained through less time-time, or more "intelligence" or some combination of the two. I'm not convinced these SOTA models are doing much more than increasing test-time.
Biggest impacts on power efficiency will be the advances in node size and transistor type like nanosheet or forksheet. Algorithm will help just a little.
Naively, I feel to be useful, the goal of LLMs should be to more power efficient. So that eventually all devices can be smarter.
Power efficiency can be gained through less time-time, or more "intelligence" or some combination of the two. I'm not convinced these SOTA models are doing much more than increasing test-time.