Hacker News new | ask | show | jobs
by bdamm 11 days ago
Is this a known or quantifiable thing? I thought that the limit had already been determined i.e. the existing models top out and at some point it doesn't matter how much time or energy you let the model consume, it won't improve the result. And with regards to training parameters, I thought we were equally limited there, e.g. the existing models can't benefit from a larger parameter space.

I was under the impression that improvements are arriving via how the models are trained and how model prompting context is constructed, rather than just by how much data or how much energy is spent searching over the model space for a particular prompt.

Is there some evidence that we have not reached a pleateau with just resource consumption on existing models?

1 comments

The existing models "top out" not because they don't get better, but because it is uneconomical.

What we do know is that a model "tops out" wrt training data - that is, for a model of a given size, there's only so much training data you can squeeze into the set before you stop seeing gains. But conversely it means that if you already have a model of say 1 Ttok that is "trained to capacity", then a model of 2 TTok needs roughly twice as much training data to fully utilize all those weights. Which means that the cost of training it is not 2x but 4x (twice as many params x twice as many tokens). And then of course serving it is 2x more expensive, but even with optimal training the gains aren't 2x. So it very quickly becomes uneconomical.

A good example of that kind of model is (was) GPT-4.5. The prices and the consequent lack of demand show why companies don't really do that sort of thing anymore.

But no, there's no evidence of a plateau as such. I'm not sure what "evidence that we have not reached a plateau" would even look like.