Hacker News new | ask | show | jobs
by zuzuen_1 308 days ago
I would be more interested in Qodo's performance on the swe-bench-multilingual benchmark. Swe-bench-verified only includes bugs related to python breakages.

The best submission is swe-bench-multilingual is Claude 3.7 Sonnet which solves ~43% of the issues in the dataset.