Hacker News new | ask | show | jobs
by unsupp0rted 377 days ago
Curious to see how this compares to Claude 4 Sonnet in code.

This table seems to indicate it's markedly worse?

https://blog.google/products/gemini/gemini-2-5-pro-latest-pr...

1 comments

Almost all of those benchmarks are coding related. It looks like SWE-Bench is the only one where Claude is higher. Hard to say which benchmark is most representative of actual work. The community seems to like Aider Polyglot from what I've seen