| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by grantpitt 206 days ago
	do say more

1 comments

GodelNumbering 206 days ago

Makes it sound like a one trick pony

link

jascha_eng 206 days ago

Anthropic is leaning into agentic coding and heavily so. It makes sense to use swe verified as their main benchmark. It is also the one benchmark Google did not get the top spot last week. Claude remains king that's all that matters here.

link

Mkengin 205 days ago

I am eagerly awaiting swe-rebench results for November with all the new models: https://swe-rebench.com/

link

grantpitt 206 days ago

well, it's a big trick

link