Y
Hacker News
new
|
ask
|
show
|
jobs
by
qwesr123
187 days ago
Not sure why they don't compare with others, but they are actually leading on the benchmarks they published. See here (bottom) for a chart comparing to other models:
https://marginlab.ai/blog/swe-bench-deep-dive/
2 comments
mistercheph
186 days ago
It's like apple, they just don't want users or anyone to even be thinking of their competitors, the competition doesn't exist, it's not relevant.
link
whimsicalism
186 days ago
is swe-bench saturated? or they switch to swe-bench pro because...?
link
Mkengin
186 days ago
At least on swe-rebench it does pretty well:
https://swe-rebench.com/
link