| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by a2128 380 days ago
	I think at this point we're reaching more incremental updates, which can score higher on some benchmarks but then simultaneously behave worse with real-world prompts, most especially if they were prompt engineered for a specific model. I recall Google updating their Flash model on their API with no way to revert to the old one and it caused a lot of people to complain that everything they've built is no longer working because the model is just behaving differently than when they wrote all the prompts.

1 comments

whbrown 379 days ago

Isn't it quite possible they replaced that Flash model with a distilled version, saving money rather than increasing quality? This just speaks to the value of open-weights more than anything.

link