Hacker News new | ask | show | jobs
by vlmutolo 103 days ago
Lots of comments about the price change, but Artifical Analysis reports that 3.1 Flash-Lite (reasoning) used fewer than half of the tokens of 2.5 Flash-Lite (reasoning).

This will likely bring the cost below 2.5 flash-lite for many tasks (depends on the ratio of input to output tokens).

That said, AA also reports that 3.1 FL was 20% more expensive to run for their complete Intelligence index benchmark.

The overall point is that cost is extremely task-dependent, and it doesn’t work to just measure token cost because reasoning can burn so many tokens, reasoning token usage varies by both task and model, and similarly the input/output ratios vary by task.

2 comments

> 3.1 Flash-Lite (reasoning)

(reasoning) doesn't say much. Is it low/med/high reasoning? I ran my own benchmarks, and 3.1 Flash-Lite on high costs A LOT: https://aibenchy.com/compare/google-gemini-3-1-flash-lite-pr...

Do not use 3.1 Flash-Lite with HIGH reasoning, it reasons for almost max output size, you can quickly get to millions of tokens of reasoning in a few requests.

Wow, that’s very interesting. I wish more benchmarks were reported along with the total cost of running that benchmark. Dollars per token is kind of useless for the reasons you mentioned.
Yup, MiniMax M-2.5 is a standout in that aspect. It's $/token is very low, because it reasons forever (fun fact, that's also the reason why it's #1 on OpenRouter, because it simply burns through tokens, and OpenRouter ranking is based on tokens usage)...
many tasks don't need any reasoning