| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by user43928 7 days ago

Has there been any evidence of a well known provider rerouting to lower quality models?

Last I saw, engineers working at OpenAI denied this on HN.

I saw that someone set up a tracker that aims to record the performance of the models, and so far it has not shown any statistically significant deviation in performance for Codex, and not yet enough data for Claude: https://marginlab.ai/trackers/codex/

2 comments

dannyw 7 days ago

Yes, OpenAI admits they silently reroute sensitive requests to different models for user welfare at least: https://openai.com/index/building-more-helpful-chatgpt-exper...

The implementation was so borked, SamA went back on Reddit and apologised: https://old.reddit.com/r/ChatGPT/comments/1o6jins/updates_fo...

Model re-routing happens for coding tasks too. For example, in OpenAI support pages used to (at least 1 month ago when I checked) mention that if they automatically use a cheaper -mini to accomplish the task behind the scenes, you’ll be charged -mini prices even if you selected a more expensive model. I just checked again and they’ve removed it, but there’s probably archives.

Finally, even if they’re the same weights, you don’t know what quantisation you’re running at. Adaptive quantisation based on load (given workday peaks), or similar techniques, have been happening since the ChatGPT 3.5 days; the techniques are probably more advanced now.

link

user43928 7 days ago

What you linked appears to be related to GPT-5's Auto router in the ChatGPT app back then, and that it supposedly would choose the 'good' model over the pretty bad Instant model for mental health requests.

That's pretty far from the hypothesis that either OpenAI or Anthropic is using adaptive quantisation based on load for their professional coding agent tools.

This is what I think engineers working at OpenAI explicitly denied, and for which we have seen zero evidence yet.

Many people seem to believe it anyway, but the non-deterministic nature of the tools appears to be the more plausible explanation for perceived degradation, in my opinion.

link

cube00 7 days ago

> Has there been any evidence of a well known provider rerouting to lower quality models?

The firm [Anthropic] would deliberately degrade the model’s performance in ways that were invisible to the user.

https://news.ycombinator.com/item?id=48485958

link