|
|
|
|
|
by Snuggly73
168 days ago
|
|
Ok, if its almighty, then why is not the benchmarks at 100%? If you look at the individual issues, those are somewhat small and trivial changes in existing codebases. https://swe-rebench.com/ (note that if you look at individual slices, Opus is getting often outperformed by Sonnet). |
|