Not my experience at all, I have written about comparing DS4 vs Opus 4.8 on 16 real life work tasks on multiple posts.
Also, every single lab does RL on benchmarks, which is why Opus 4.6 was the last truly great assistant, after it, all models tend to drift into implementation asap.
Also, every single lab does RL on benchmarks, which is why Opus 4.6 was the last truly great assistant, after it, all models tend to drift into implementation asap.