Hacker News new | ask | show | jobs
by obblekk 205 days ago
I see 25-29% here https://www.swebench.com/viewer.html for models released in Nov 2024 albeit not verified. gpt4o (Aug 2024) was 33% for swe bench verified.

Important point because people have a bias to underestimate the speed of ai progress.

1 comments

Do you people think nobody calls your bluff?

Here’s the launch card of the sonnet 3.5 from a year and a month ago. Guess the number. Ok, Ill tell you: 49.0%. So yeah, the comment you replied to was not really off.

https://www.anthropic.com/news/3-5-models-and-computer-use