| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jychang 36 days ago
	There's almost 0% chance that OpenAI doesn't quantize the model right off the bat. I am willing to bet large amounts of money that OpenAI would never release a model served as fully BF16 in the year of our lord 2026. That would be insane operationally. They're almost certainly doing QAT to FP4 for FFN, and a similar or slightly larger quant for attention tensors.

1 comments

selcuka 36 days ago

It's ok if they never release a BF16 model, but it's less ok if they release it, win the benchmarks, then quantise it after a few weeks.

link

jychang 34 days ago

That would be REALLY easy to detect. It'll be 4x slower.

The tokens/sec of the model is basically directly proportional of the memory bandwidth of the hardware it runs on. So either OpenAI has to gimp model performance for its entire life, or somehow magically speed it up 4x on the first day.

link

retinaros 35 days ago

that is for sure what everyone does. also they train on evals with the datasets that they would be bench against.

link

tedsanders 35 days ago

What do you mean by this? We don’t train on evals, and if we did I’d quit on the spot.

(The loose version of this that’s true is that there may exist eval data contamination in pretraining. This is a hard problem to fully solve.)

link

retinaros 35 days ago

its not that loose of a version. its the reality and as probably is surely a focus of a dedicated post training RL-ing these kind of githubs. of course you would train specifically on the task. you would mix this eval data with others in thousands of githubs repos.

link