Hacker News new | ask | show | jobs
by RockyMcNuts 537 days ago
exactly ... Gemini 2.0 Flash ranks better on quality, is faster, and cheaper if you assume same pricing as 1.5 (might go down).

these models are being commoditized.

https://artificialanalysis.ai/models/deepseek-v3

1 comments

For what it's worth, as always 99% benchmarks are very unreliable and per-task performance still greatly differs per model, with plenty of cases where results are wildly different.

I have a task I use in my work where Gemini 1.5-Pro is SOTA. Handily beating o1, Sonnet-3.5, Gemini-exp and everyone else, very consistently and significantly.

The newer/bigger models are better at reasoning and especially coding, but there's plenty of tasks that have little overlap with those skills.