Hacker News new | ask | show | jobs
by epolanski 4 hours ago
Not my experience at all, I have written about comparing DS4 vs Opus 4.8 on 16 real life work tasks on multiple posts.

Also, every single lab does RL on benchmarks, which is why Opus 4.6 was the last truly great assistant, after it, all models tend to drift into implementation asap.

1 comments

Hi, author here, can you link? I would love to read about this.