|
|
|
|
|
by veselin
358 days ago
|
|
I think that people are just too quick to assume this is amazing, before it is there. Which doesn't mean it won't get there. Somehow if I take the best models and agents, most hard coding benchmarks are at below 50% and even swe bench verified is like at 75 maybe 80%. Not 95. Assuming agents just solve most problems is incorrect, despite it being really good at first prototypes. Also in my experience agents are great to a point and then fall off a cliff. Not gradually. Just the type of errors you get past one point is so diverse, one cannot even explain it. |
|