Hacker News new | ask | show | jobs
by Snuggly73 168 days ago
Ok, if its almighty, then why is not the benchmarks at 100%? If you look at the individual issues, those are somewhat small and trivial changes in existing codebases.

https://swe-rebench.com/

(note that if you look at individual slices, Opus is getting often outperformed by Sonnet).