| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by keeganpoppen 13 days ago
	the output is definitely better. and i find it crazy how every time a new model comes out people trip over themselves to say how much worse it is than previous models, when in fact that is basically an impossibility. like, they've got the numbers, man-- you only release a new model when the numbers get gooder. the burden of proof is on the "didn't get better" side, not the "prove that it's better" side, because the architecture itself (1) only works because of how giant the training data / eval / etc. sets are and (2) has a fractal property of becoming strictly deeper and more thoughtful when you just click and drag the edge up and to the right (obviously AI research is harder than this, but that doesn't make the general point untrue). i say this especially because the scuttlebut is that this model genuinely is a shift-click-expand moreso than any sort of architectural "new science" or anything. this is exactly why hypotheses come before the experiment in the scientific method.

1 comments

You're wrong in lots of ways.

Some model cards do show regressions on benchmarks for newer models on specific tasks: https://storage.googleapis.com/deepmind-media/Model-Cards/Ge...

This wasn't a new model but updates to models backed by numbers being better can make the model worse: https://openai.com/index/sycophancy-in-gpt-4o/

The slight increases in performance/benchmarks may be just noise: https://arxiv.org/pdf/2602.07150