|
|
|
|
|
by spgorbatiuk
9 hours ago
|
|
Not sure if I got the question right, but there are benchmarks like SWE pro and stuff. There's whole another debate whether you can trust it or not, and whether the labs are training on those benchmarks, but that's one way to measure that. Other than benchmarks, I'd say that's your own test suite |
|