| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hit8run 51 days ago
	This benchmark draws a very different picture having GPT5.5 on the very top with 70% and DeepSeek at 8% https://deepswe.datacurve.ai

1 comments

zozbot234 51 days ago

DeepSWE has been heavily criticized though. https://github.com/datacurve-ai/deep-swe/issues/21 Putting GPT 5.5 on top is the obviously correct part, but everything else about it makes very little sense.

link