|
|
|
|
|
by ofirpress
396 days ago
|
|
Not sure what you mean by benchmaxxing but we think there's still a lot of useful signals you can infer from SWE-bench-style benchmarking. We also have SWE-bench Multimodal which adds a twist I haven't seen elsewhere:
https://www.swebench.com/multimodal.html |
|