| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by golfer 70 days ago

Arena.ai:

> Gemini 3.5 Flash’s pricing shifts the Pareto frontier in Text. 8 models from GoogleDeepMind dominate the Text Arena Pareto curve where only 4 labs are represented for top performance in their price tiers.

https://x.com/arena/status/2056793180998361233

2 comments

h14h 70 days ago

Given how widely varying the amount of tokens each model uses for a given task, "price-per-token" is essentially meaningless when doing this sort of comparison.

Artificial Analysis's "Cost to run" model (aka num_tokens_used * price_per_token) is much better, but even that is likely problematic since it's not clear whether running a bunch of benchmarks maps cleanly to real-world token use.

link

ohlookcake 70 days ago

That graph seems odd. It looks like Gemini 3.5 Flash is not actually on the convex hull, and they forced the 'frontier' to bend inwards to include it

link