|
|
|
|
|
by vlmutolo
103 days ago
|
|
Lots of comments about the price change, but Artifical Analysis reports that 3.1 Flash-Lite (reasoning) used fewer than half of the tokens of 2.5 Flash-Lite (reasoning). This will likely bring the cost below 2.5 flash-lite for many tasks (depends on the ratio of input to output tokens). That said, AA also reports that 3.1 FL was 20% more expensive to run for their complete Intelligence index benchmark. The overall point is that cost is extremely task-dependent, and it doesn’t work to just measure token cost because reasoning can burn so many tokens, reasoning token usage varies by both task and model, and similarly the input/output ratios vary by task. |
|
(reasoning) doesn't say much. Is it low/med/high reasoning? I ran my own benchmarks, and 3.1 Flash-Lite on high costs A LOT: https://aibenchy.com/compare/google-gemini-3-1-flash-lite-pr...
Do not use 3.1 Flash-Lite with HIGH reasoning, it reasons for almost max output size, you can quickly get to millions of tokens of reasoning in a few requests.