|
|
|
|
|
by andyk
544 days ago
|
|
That has a double meaning - half tongue in cheek. 1) since we are creating a contamination-free version of SWE-bench (i.e. scraping a new test set after submissions are frozen) it is guaranteed that agents in this contest can't "cheat", i.e., models can't have trained on the benchmark / agents cant memorize answers. 2) as a general rule in life, don't cheat on things (not that there aren't exceptions) |
|