|
|
|
|
|
by throwawayffffas
52 days ago
|
|
> Note: "Benchmarks are less important than real-world tests for production adoption" > Significantly better SWE-Bench (+56 pts), MCP tool use (2x), and agent workflows. What? Make up your mind do the benchmarks matter or not? |
|