| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mordae 26 days ago
	This is a terrible benchmark. It literally tests the models on their ability to track shifting line numbers. If they cannot keep up, no amount of abstract reasoning can redeem them.

1 comments

Where did you get that idea? It uses mini-swe-agent, same as SWE-Bench.