Y
Hacker News
new
|
ask
|
show
|
jobs
by
GodelNumbering
206 days ago
Makes it sound like a one trick pony
2 comments
jascha_eng
206 days ago
Anthropic is leaning into agentic coding and heavily so. It makes sense to use swe verified as their main benchmark. It is also the one benchmark Google did not get the top spot last week. Claude remains king that's all that matters here.
link
Mkengin
205 days ago
I am eagerly awaiting swe-rebench results for November with all the new models:
https://swe-rebench.com/
link
grantpitt
206 days ago
well, it's a big trick
link