|
|
|
|
|
by gronky_
305 days ago
|
|
It’s a pass@1 benchmark. When submitting you need to check a box that there was only 1 attempt per problem. See here for example:
https://github.com/SWE-bench/experiments/pull/219 Building multiple attempts into your agent is stretching the rules, even if technically it’s acceptable |
|