Hacker News new | ask | show | jobs
by bwestergard 814 days ago
Friendly suggestion to the authors: success rates aren't meaningful to all but a handful of researchers. They should add a few examples of tests SWE-agent passed and did not pass to the README.
2 comments

Yes please, the code quality on Devin was incredibly poor in all examples I traced down.

At least from a maintainability perspective.

I would like to see if this implementation is less destructive or at least more suitable for a red-green-refactor workflow.

Unless you weren't actually that successful but need to publish a "successful" result