| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by unsupp0rted 377 days ago

Curious to see how this compares to Claude 4 Sonnet in code.

This table seems to indicate it's markedly worse?

https://blog.google/products/gemini/gemini-2-5-pro-latest-pr...

1 comments

gundmc 377 days ago

Almost all of those benchmarks are coding related. It looks like SWE-Bench is the only one where Claude is higher. Hard to say which benchmark is most representative of actual work. The community seems to like Aider Polyglot from what I've seen

link