Hacker News new | ask | show | jobs
by throwawayffffas 52 days ago
> Note: "Benchmarks are less important than real-world tests for production adoption"

> Significantly better SWE-Bench (+56 pts), MCP tool use (2x), and agent workflows.

What? Make up your mind do the benchmarks matter or not?