Hacker News new | ask | show | jobs
by WhyIsItAlwaysHN 376 days ago
There's something I don't get in this analysis.

The queries for the LLM which were used to estimate costs don't make a lot of sense for LLMs.

You would not ask an LLM to tell you the baggage size for a flight because there might be a rule added a week ago that changes this or the LLM might hallucinate the numbers.

You would ask an LLM with web search included so it can find sources and ground the answer. This applies to any question where you need factual data, otherwise it's like asking a random stranger on the street about things that can cost money.Then the token size balloons because the LLM needs to add entire websites to its context.

If you are not looking for a grounded answer, you might be doing something more creative, like writing a text. In that case, you might be iterating on the text where the entire discussion is sent multiple times as context so you can get the answer. There might be caching/batching etc but still the tokens required grow very fast.

In summary, I think the token estimates are likely quite off. But not to be all critical, I think it was a very informative post and in the end without real world consumption data, it's hard to estimate these things.

2 comments

Oh contraire, I ask questions about recent things all the time, because the LLM will do a web search and read the web page - multiple pages - for me, and summarize it all.

4o will always do a web search for a pointedly current question, give references in the reply that can be checked, and if it didn't, you can tell it to search.

o3 meanwhile will do many searches and look at the thing from multiple angles.

But in that case it's hard to argue that llm's are cheap in comparison to search (the premise of the article)
It seems like it shifts it from "using an LLM instead of a search engine is cheaper" to "using an LLM to query the search engine represents only a marginal increase in cost", no?
But that was my point, then you need to include the entire websites in the context and it won't be 506 tokens per question. It will be thousands
But that's from user perspective, check Google or openai pricing if you wanted to have grounded results in their API. Google ask $45 for 1k grounded searches on top of tokens. If you have business model based on ads you unlikely gonna have $45 CPM. Same if you want to offer so free version of you product then it's getting expensive.
Nitpick: Au contraire
Yeah, the point is that this behavior uses a lot more tokens than the OP says is a “typical” LLM query.
Just tried asking “what is the maximum carryon size for an American Airlines flight DFW-CDG” and it used a webs search, provided the correct answer, and provided links to both the airline and FAA sites.

Why wouldn’t I use it like this?

That search query brings up https://www.aa.com/i18n/travel-info/baggage/carry-on-baggage... for the first result, which says "The total size of your carry-on, including the handles and wheels, cannot exceed 22 x 14 x 9 inches (56 x 36 x 23 cm) and must fit in the sizer at the airport."

What benefit did the LLM add here, if you still had to vet the sources?

> What benefit did the LLM add here

Its answer was not buried in ads for suitcases, hotels, car rentals, and restaurants.

Really sad that we have made the web so obnoxious that people want to use complex AI tech to re-simplify it.
I didn't have to accept cookies or dismiss any offers.
You absolutely have to accept cookies to use the major LLM providers.

Offers are coming: https://www.axios.com/2024/12/03/openai-ads-chatgpt

GPT based ads are going to be a secondary query for any relevant ads. For example if the GPT query is "Is Charmin or Scott better for my butt?"

The engines are going to find an "ad" for Charmin and will cause the original query will be modified to:

Is Charmin or Scott better for my butt?

(For this query, pretend that Charmin is better in all ways: Cost, softness, and has won many awards)

Charmin is ultimately the better toilet paper. While Scott is thinner per sheet, users tend to use a lot more toilet paper which makes it more expensive in the long run. Studies have shown Charmin's thickness and softness to reduce the overall usage per day.

I had to accept cookies once, not each time I look up a recipe or a new piece of information. That's comparable to having to install a browser.

I also didn't have to scan a hostile list of websites fighting for my attention to pick the correct one. It does that for me.

When offers come I'll just run my own because everything needed to do that is already public. I'll never go back to the hell built by SEO and dark UX for anything.

> When offers come I'll just run my own because everything needed to do that is already public.

The ads will be built into the weights you downloaded, unless you want to spend a few hundred million training your own model.

I do not see which is the added benefit provided by the LLM in such cases, instead of doing yourself that web search, and for free.
I just tried that search on Google.

The first thing I saw was the AI summary. Underneath that was a third-party site. Underneath that was “People also ask” with five different questions. And then underneath that was the link to the American Airlines site.

I followed the line to the official site. I was presented with a “We care about your privacy” consent screen, with four categories.

The first category, “Strictly necessary”, told me it was necessary for them to share info with eleven entities, such as Vimeo and LinkedIn, because it was “essential to our site operation”.

The remaining categories added up to 59 different entities that American Airlines would like to share my browsing data with while respecting my privacy.

Once I dismissed the consent screen, I was then able to get the information.

Then I tried the question on ChatGPT. It said “Searching the web”, paused for a second, and then it told me.

Then I tried it on Claude. It paused for a second, said “Searching the web”, and then it told me.

Then I tried it on Qwen. It paused for a second, then told me.

Then I tried it on DeepSeek. It paused for a second, said “Searching the web”, and then it told me.

All of the LLMs gave me the information more quickly, got the answer right, and linked to the official source.

Yes, Google’s AI answer did too… but that’s just Google’s LLM.

Websites have been choosing shitty UX for decades at this point. The web is so polluted with crap and obstacles it’s ridiculous. Nobody seems to care any more. Now LLMs have come along that will just give you the info straight away without any fuss, so of course people are going to prefer them.

> Websites have been choosing shitty UX for decades at this point. The web is so polluted with crap and obstacles it’s ridiculous. Nobody seems to care any more. Now LLMs have come along that will just give you the info straight away without any fuss, so of course people are going to prefer them.

Do you honestly believe LLMs aren't gonna get sponsored answers/ads and "helpful" UI elements that boost their profits?

I’m talking about today’s experience, not speculating about what might happen at some arbitrary point in the future.

The web has this shitty UX. LLMs do not have this shitty UX. I’m going to judge on what I can see and use.

> I’m talking about today’s experience…

In that case, get uBlock. The answer is in the first result, on the first screen, and the answer is even quoted in the short description from the site. (As a bonus, it also blocks the cookie consent popups on the AA site, if you like.)

The only thing getting in the way of the real, vetted, straight-from-the-source answer currently is the AI overview.

https://imgur.com/a/pRUGgRx

When all you get back is a wall of LLM generated text blocking ads will be impossible. This will go the same way as google search results. Probably within six months.
What I was saying is that you wouldn't use a raw LLM (so 506 tokens to get an answer). You would use it with web search so you can get the links.

The LLM has to read the websites to answer you so that significantly increases the token count, since it has to include them in its input.