| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Snuggly73 168 days ago

Ok, if its almighty, then why is not the benchmarks at 100%? If you look at the individual issues, those are somewhat small and trivial changes in existing codebases.

https://swe-rebench.com/

(note that if you look at individual slices, Opus is getting often outperformed by Sonnet).