|
|
|
|
|
by esafak
272 days ago
|
|
This was the perfect opportunity to share the evidence. I think undisclosed quantization is definitely a thing. We need benchmarks to be periodically re-evaluated to ward against this. Providers should keep timestamped models fixed, and assign modified versions a new timestamp, and price, if they want. The model with the "latest" tag could change over time, like a Docker image. Then we can make an informed decision over which version to use. Companies want to cost optimize their cake and eat it too. edit: I have the same complaint about my Google Home devices. The models they use today are indisputably worse than the ones they used five whole years ago. And features have been removed without notice. Qualitatively, the devices are no longer what I bought. |
|
> Not quantized. Weights are the same. If we did change the model, we’d release it as a new model with a new name in the API.”
- [Ted Sanders](https://news.ycombinator.com/item?id=44242198) (OpenAI)
The problem here is that most issues stem from broader infrastructure issues like numerical instability at inference time. Since this affects their whole service pipeline, the logic here can't really be encapsulated in a frozen environment like a Docker container. I suppose _technically_ they could maintain a separate inference cluster for each of their point releases, but that also means that previous models don't benefit from common infrastructure improvements / load balancing would be more difficult to shard across GPUs / might be logistically so hard to coordinate to effectively make it impossible.
https://www.anthropic.com/engineering/a-postmortem-of-three-... https://thinkingmachines.ai/blog/defeating-nondeterminism-in...