|
|
|
|
|
by akitaonrails
25 days ago
|
|
TL;DR: Opus 4.8 doesn’t feel much different from Opus 4.7, not in daily use and not in the benchmark. 95/100 against 97/100, inside the noise. It’s the fastest Opus I’ve measured, but the day-to-day experience is the same.
Grok 4.3, anecdotally, struck me as a bit more literal and strict about following the prompt. In the benchmark it improved substantially over the previous generation (Grok 4.20). It still doesn’t come close to Opus or GPT, but at least now it starts to be usable. 72/100, Tier B.
MiniMax M3, same story. The previous version (M2.7) was unusable, and the new one is finally at least usable. 78/100, Tier B.
All three land roughly in the band of a Sonnet 4.6 or a DeepSeek V4. One or two generations behind the new Opus and GPT. |
|