|
|
|
|
|
by cjsaltlake
58 days ago
|
|
SWE-bench was created to replace olympiad coding benchmarks. I think past olympiad coding benchmarks were much worse representative of real-world coding than something like SWE-bench, which is derived from real units of labor. Further, olympiad style benchmarks are arguably easier to contaminate / memorize unless you refresh it regularly; but that goes for SWE-bench too. |
|
Simple enough that anyone could run it with a regular subscription.
Really unless we can get the providers to ditch the gameable benchmarks they won't.
But industries love nothing more than a benchmark they can manipulate.