Hacker News new | ask | show | jobs
by tedsanders 40 days ago
For what it's worth, I work at OpenAI and I can guarantee you that we don't switch to heavily quantized models or otherwise nerf them when we're under high load. It's true that the product experience can change over time - we're frequently tweaking ChatGPT & Codex with the intention of making them better - but we don't pull any nefarious time-of-day shenanigans or similar. You should get what you pay for.
3 comments

> we don't switch to heavily quantized models

That sounded like a press bulletin, so just to let you clarify yourself: Does that mean you may switch to lightly quantized models?

There's almost 0% chance that OpenAI doesn't quantize the model right off the bat.

I am willing to bet large amounts of money that OpenAI would never release a model served as fully BF16 in the year of our lord 2026. That would be insane operationally. They're almost certainly doing QAT to FP4 for FFN, and a similar or slightly larger quant for attention tensors.

It's ok if they never release a BF16 model, but it's less ok if they release it, win the benchmarks, then quantise it after a few weeks.
That would be REALLY easy to detect. It'll be 4x slower.

The tokens/sec of the model is basically directly proportional of the memory bandwidth of the hardware it runs on. So either OpenAI has to gimp model performance for its entire life, or somehow magically speed it up 4x on the first day.

that is for sure what everyone does. also they train on evals with the datasets that they would be bench against.
What do you mean by this? We don’t train on evals, and if we did I’d quit on the spot.

(The loose version of this that’s true is that there may exist eval data contamination in pretraining. This is a hard problem to fully solve.)

its not that loose of a version. its the reality and as probably is surely a focus of a dedicated post training RL-ing these kind of githubs. of course you would train specifically on the task. you would mix this eval data with others in thousands of githubs repos.
Thanks - let me clarify that we don’t switch to lightly quantized models by time of day or when under heavy load either.

(I used the adjective heavily because that’s what the original post said. I have no intention of making misleading but technically true statements.)

Thank you for your answer. I have a similar question as OP, but in regards of the GPT models in MS copilot. My experience is that the response quality is much better when calling the API directly or through the webUI.

I know this might be a question that's impossible for you to answer, but if you can shed any light to this matter, I'd be grateful as I am doing an analysis over what AI solutions that can be suitable for my organisation.

As phrased the only answer is the question; "as opposed to what?"
webUIs have giant system prompts built in

APIs have much smaller ones

its very interesting to see that this only happens to American companies. What gives?