|
|
|
|
|
by gundmc
377 days ago
|
|
Almost all of those benchmarks are coding related. It looks like SWE-Bench is the only one where Claude is higher. Hard to say which benchmark is most representative of actual work. The community seems to like Aider Polyglot from what I've seen |
|