Hacker News new | ask | show | jobs
by taurath 2 days ago
Doesn't this "silent degredation" prevent any actual evaluation of the model? If the model fails at something, this allows anyone to claim that it failed due to degradation.
2 comments

Who cares if it can be evaluated independently? The majority of commenters on HN were happy to vibe code and ship products with the models we had 1-2 years ago. It continues to be laughable.

I understand that moving the goalpost every release is unfair, but it's similarly concerning to consider that people were letting GPT 4.X vibe code and ship entire products.

I don’t think so? They can claim it was an act of God for all I care, but at the end of the day the model failed the task.