| HN Mirror

The existing models "top out" not because they don't get better, but because it is uneconomical.

What we do know is that a model "tops out" wrt training data - that is, for a model of a given size, there's only so much training data you can squeeze into the set before you stop seeing gains. But conversely it means that if you already have a model of say 1 Ttok that is "trained to capacity", then a model of 2 TTok needs roughly twice as much training data to fully utilize all those weights. Which means that the cost of training it is not 2x but 4x (twice as many params x twice as many tokens). And then of course serving it is 2x more expensive, but even with optimal training the gains aren't 2x. So it very quickly becomes uneconomical.

A good example of that kind of model is (was) GPT-4.5. The prices and the consequent lack of demand show why companies don't really do that sort of thing anymore.

But no, there's no evidence of a plateau as such. I'm not sure what "evidence that we have not reached a plateau" would even look like.