| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by haffi112 315 days ago
	It makes it look like the presentation is rushed or made last minute. Really bad to see this as the first plot in the whole presentation. Also, I would have loved to see comparisons with Opus 4.1. Edit: Opus 4.1 scores 74.5% (https://www.anthropic.com/news/claude-opus-4-1). This makes it sound like Anthropic released the upgrade to still be the leader on this important benchmark.

2 comments

> like the presentation is rushed or made last minute

Or written by GPT-5?

They never compare with other vendors