|
|
|
|
|
by threepts
37 days ago
|
|
You can read the paper here: https://labs.scale.com/papers/swe_bench_pro TL;DR its very effective as it directly tests model on REAL codebases: "The benchmark is constructed from GPL-style copyleft repositories and private proprietary codebases". The use case is very real. |
|