| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by suttontom 13 days ago

You're wrong in lots of ways.

Some model cards do show regressions on benchmarks for newer models on specific tasks: https://storage.googleapis.com/deepmind-media/Model-Cards/Ge...

This wasn't a new model but updates to models backed by numbers being better can make the model worse: https://openai.com/index/sycophancy-in-gpt-4o/

The slight increases in performance/benchmarks may be just noise: https://arxiv.org/pdf/2602.07150