Hacker News new | ask | show | jobs
by bustadjustme 377 days ago
Sorry if I missed it, but how is a single token output from an LLM comparable to a search result from an engine? The author here compares 1k tokens (as an estimate for an average LLM single query response) to 1k web search queries. How is this not a factor of 1000 error?

> To compare a midrange pair on quality, the Bing Search vs. a Gemini 2.5 Flash comparison shows the LLM being 1/25th the price.

That is, 40x the price _per query_ on average (which is the unit of user interaction). LLMs with web-search will only multiply this value, as several queries are made behind the scenes for each user-query.

EDIT: thanks, zahlman, he does quote LLM prices in 1M tokens, or 1k user-queries, so the above concern is mistaken!

2 comments

> The author here compares 1k tokens (as an estimate for an average LLM single query response) to 1k web search queries. How is this not a factor of 1000 error?

The author compares 1k uses of the LLM - resulting in an estimated 1M output tokens, and the prices are quoted per 1M tokens - to 1k uses of the search engine (the prices for which are directly quoted per 1k uses).

Gemini 2.0 Flash is listed at 0.4 USD / 1M tokens. Bing search API is 15 USD / 1k queries. So the LLM is indeed 37 times cheaper for a 1000 token query.