Hacker News new | ask | show | jobs
by crishoj 264 days ago
Any idea what "output token efficiency" refers to? Gemini Flash is billed by number of input/output tokens, which I assume is fixed for the same output, so I'm struggling to understand how it could result in lower cost. Unless of course they have changed tokenization in the new version?
3 comments

They provide the answer in less words (while still conveying what needed to be said).

Which is a good thing in my book as the models now are way too verbose (and I suspect one of the reasons is the billing by tokens).

The post implies that the new model are better at thinking, therefore less time/cost spent overall.

The first chart implies the gains are minimal for nonthinking models.

Models are less verbose, so produces fewer output tokens, so answers cost less.