| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vlmutolo 150 days ago

Lots of comments about the price change, but Artifical Analysis reports that 3.1 Flash-Lite (reasoning) used fewer than half of the tokens of 2.5 Flash-Lite (reasoning).

This will likely bring the cost below 2.5 flash-lite for many tasks (depends on the ratio of input to output tokens).

That said, AA also reports that 3.1 FL was 20% more expensive to run for their complete Intelligence index benchmark.

The overall point is that cost is extremely task-dependent, and it doesn’t work to just measure token cost because reasoning can burn so many tokens, reasoning token usage varies by both task and model, and similarly the input/output ratios vary by task.

2 comments

XCSme 150 days ago

> 3.1 Flash-Lite (reasoning)

(reasoning) doesn't say much. Is it low/med/high reasoning? I ran my own benchmarks, and 3.1 Flash-Lite on high costs A LOT: https://aibenchy.com/compare/google-gemini-3-1-flash-lite-pr...

Do not use 3.1 Flash-Lite with HIGH reasoning, it reasons for almost max output size, you can quickly get to millions of tokens of reasoning in a few requests.

link

vlmutolo 149 days ago

Wow, that’s very interesting. I wish more benchmarks were reported along with the total cost of running that benchmark. Dollars per token is kind of useless for the reasons you mentioned.

link

XCSme 149 days ago

Yup, MiniMax M-2.5 is a standout in that aspect. It's $/token is very low, because it reasons forever (fun fact, that's also the reason why it's #1 on OpenRouter, because it simply burns through tokens, and OpenRouter ranking is based on tokens usage)...

link

XCSme 149 days ago

https://aibenchy.com/compare/google-gemini-3-1-flash-lite-pr...

link

msp26 150 days ago

many tasks don't need any reasoning

link