|
|
|
|
|
by ofirpress
278 days ago
|
|
[I'm on the SWE-bench team] Multiple people have looked into this, for example right in that thread: https://github.com/SWE-bench/SWE-bench/issues/465#issuecomme... This issue had affected a tiny fraction of existing agents in a tiny fraction of their runs. And we've now issued a fix. This is a natural part of running a benchmark, I'm sure tiny things like this will keep on getting discovered and we'll keep on fixing them. This doesn't change the overall picture or trends at all. |
|
Edit: That said, I’m willing to believe based on the information in the thread that this most likely only affects a tiny fraction of runs.