|
|
|
|
|
by modeless
479 days ago
|
|
> When we filtered out these problematic issues, the resolution rate of SWE-Agent+GPT-4 dropped from 12.47% to 3.97%. This matches my intuition about the coding performance of these models a lot better. I don't think any current coding benchmark accurately measures coding performance. |
|