Hacker News new | ask | show | jobs
by a2128 380 days ago
I think at this point we're reaching more incremental updates, which can score higher on some benchmarks but then simultaneously behave worse with real-world prompts, most especially if they were prompt engineered for a specific model. I recall Google updating their Flash model on their API with no way to revert to the old one and it caused a lot of people to complain that everything they've built is no longer working because the model is just behaving differently than when they wrote all the prompts.
1 comments

Isn't it quite possible they replaced that Flash model with a distilled version, saving money rather than increasing quality? This just speaks to the value of open-weights more than anything.