| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by taurath 51 days ago
	Doesn't this "silent degredation" prevent any actual evaluation of the model? If the model fails at something, this allows anyone to claim that it failed due to degradation.

2 comments

lionkor 51 days ago

Who cares if it can be evaluated independently? The majority of commenters on HN were happy to vibe code and ship products with the models we had 1-2 years ago. It continues to be laughable.

I understand that moving the goalpost every release is unfair, but it's similarly concerning to consider that people were letting GPT 4.X vibe code and ship entire products.

link

janalsncm 51 days ago

I don’t think so? They can claim it was an act of God for all I care, but at the end of the day the model failed the task.

link