Hacker News new | ask | show | jobs
by ectopasm83 806 days ago
The point is that the success rate is progressing, paper after paper

> The baseline results of Magis (10%), Devin (14%) are evaluated in another subset of SWE-bench, which we cannot directly compare with, so we take the results from their technical reports as a reference.

Wondering how it compares with these models.

1 comments

Why not use AutoCodeRover, Magis, and Devin together for 46%

/s