| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ectopasm83 806 days ago

The point is that the success rate is progressing, paper after paper

> The baseline results of Magis (10%), Devin (14%) are evaluated in another subset of SWE-bench, which we cannot directly compare with, so we take the results from their technical reports as a reference.

Wondering how it compares with these models.

1 comments

invalidusernam3 806 days ago

Why not use AutoCodeRover, Magis, and Devin together for 46%

link