| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bwestergard 814 days ago
	Friendly suggestion to the authors: success rates aren't meaningful to all but a handful of researchers. They should add a few examples of tests SWE-agent passed and did not pass to the README.

2 comments

Yes please, the code quality on Devin was incredibly poor in all examples I traced down.

At least from a maintainability perspective.

I would like to see if this implementation is less destructive or at least more suitable for a red-green-refactor workflow.

Unless you weren't actually that successful but need to publish a "successful" result