|
|
|
|
|
by xg15
272 days ago
|
|
Sorry, but this makes no sense. Numerical instability would lead to random fluctuations in output quality, but not to a continuous slow decline like the OP described. Heard of similar experiences from RL acquaintances, where a prompt worked reliably for hundreds of requests per day for several months - and then suddenly the model started to make mistakes, ignore parts of the prompt, etc when a newer model was released. I agree, it doesn't have to be deliberate malice like intentionally nerfing a model to make people switch to the newer one - it might just be that less resources are allocated to the older model once the newer one is available and so the inference parameters change - but some effect at the release of a newer model seems to be there. |
|
As for the original forum post:
- Multiple numerical computation bugs can compound to make things worse (we saw this in the latest Anthropic post-mortum)
- OP didn't provide any details on eval methodology, so I don't think it's worth speculating on this anecdotal report until we see more data