Hacker News new | ask | show | jobs
by dannyw 7 days ago
Yes, OpenAI admits they silently reroute sensitive requests to different models for user welfare at least: https://openai.com/index/building-more-helpful-chatgpt-exper...

The implementation was so borked, SamA went back on Reddit and apologised: https://old.reddit.com/r/ChatGPT/comments/1o6jins/updates_fo...

Model re-routing happens for coding tasks too. For example, in OpenAI support pages used to (at least 1 month ago when I checked) mention that if they automatically use a cheaper -mini to accomplish the task behind the scenes, you’ll be charged -mini prices even if you selected a more expensive model. I just checked again and they’ve removed it, but there’s probably archives.

Finally, even if they’re the same weights, you don’t know what quantisation you’re running at. Adaptive quantisation based on load (given workday peaks), or similar techniques, have been happening since the ChatGPT 3.5 days; the techniques are probably more advanced now.

1 comments

What you linked appears to be related to GPT-5's Auto router in the ChatGPT app back then, and that it supposedly would choose the 'good' model over the pretty bad Instant model for mental health requests.

That's pretty far from the hypothesis that either OpenAI or Anthropic is using adaptive quantisation based on load for their professional coding agent tools.

This is what I think engineers working at OpenAI explicitly denied, and for which we have seen zero evidence yet.

Many people seem to believe it anyway, but the non-deterministic nature of the tools appears to be the more plausible explanation for perceived degradation, in my opinion.