|
|
|
|
|
by fragmede
147 days ago
|
|
Because they are getting better. They're still far from perfect/AGI/ASI, but when was the last time you saw the word "delve"? So the models are clearly changing, the question is why doesn't the data show That they're actually better? Thing is, everyone knows the benchmarks are being gamed. Exactly how is besides the point. In practice, anecdotally, Opus 4.5 is noticably better than 4, and GPT 5.2 has also noticably improved. So maybe the real question is why do you believe this data when it seems at odds with observations by humans in the field? > Jeff Bezos: When the data and the anecdotes disagree, the anecdotes are usually right. https://articles.data.blog/2024/03/30/jeff-bezos-when-the-da... |
|
Most of what I can do now with them I could do half a year to a year ago. And all the mistakes and fail loops are still there, across all models.
What changed is the number of magical incantations we throw at these models in the form of "skills" and "plugins" and "tools" hoping that this will solve the issue at hand before the context window overflows.