|
|
|
|
|
by stared
52 days ago
|
|
Thank you for sharing benchmark. However, the results are selective. Why no Opus 4.7? Why Gemini 3.1 Pro is missing? If there is some other criterion (e.g. models within certain time or budget), great - just make it explicit. When I see "Top 5 at a glance" and it missed key frontier models, I am (at best) confused. |
|
For the benchmark, was kept consistent across all models and typically opus and 3.1 pro would be overkill and expensive even with reasoning off.
Good point tho, will add this point in the blog too :)
Also the benchmark is open source, so anyone can run a model on it and create a PR too, the leaderboard is dynamic and will automatically add that in.