| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zuzuen_1 355 days ago
	I would be more interested in Qodo's performance on the swe-bench-multilingual benchmark. Swe-bench-verified only includes bugs related to python breakages. The best submission is swe-bench-multilingual is Claude 3.7 Sonnet which solves ~43% of the issues in the dataset.