Hacker News new | ask | show | jobs
by mordae 26 days ago
This is a terrible benchmark. It literally tests the models on their ability to track shifting line numbers. If they cannot keep up, no amount of abstract reasoning can redeem them.
1 comments

Where did you get that idea? It uses mini-swe-agent, same as SWE-Bench.

https://github.com/datacurve-ai/deep-swe