|
|
|
|
|
by ai-tamer
53 days ago
|
|
Same. The numbers match your feel. Going from 4.6 to 4.7: +14.6 on MCP-Atlas, +10.9 on SWE-bench Pro, tool errors cut by two-thirds. But BrowseComp dropped 4.7 points. Anthropic's own announcement says 4.7 "takes the instructions literally" where 4.6 interpreted them loosely, and recommends re-tuning prompts accordingly. In a conversational loop with an opinionated developer, that translates to less quality because less reasoning — the model executes instead of thinking through.
https://llm-stats.com/blog/research/claude-opus-4-7-vs-opus-...
https://www.anthropic.com/news/claude-opus-4-7 |
|