Hacker News new | ask | show | jobs
by wkat4242 620 days ago
One of the things that kinda illustrate this for me, is that an LLM always uses the same time to process a prompt of the same length. No matter how complicated the problem is. Obviously the complexity of the problem is not actually taken into account.
3 comments

This is only true if the output is the same length (which should be exceptionally rare if the input text is different).
That's true, I was talking about tokens/sec output but I should have specified.
the o1 model definitely has a somewhat big variance in how long the task takes depending on what you ask it to do
True the o1 model is the one exception though it's really more of a chain of LLMs. I wouldn't consider it a pure LLM.

Also, o1 still fails at many mathematical tasks which the linked article clarifies.

You don't see the majority of tokens it is generating.
Yes, I'm not claiming that is true formal reasoning, but it is certainly more of a chain of thought than was previously being done and does indicate that some questions require more and less "thought"
Wait what ? Is that real?
Yes. In the end, LLMs are a sequence of matrix multiplications and since they don't loop internally, every output token gets the same number of internal processing steps, no matter what the input is. Only the input length is relevant because some steps can be skipped if the input buffer is not full.
Yes. OpenAI's o1 model is an attempt to address this, by letting the model choose to "think" by generating hidden tokens for a variable amount of time before producing the visible output tokens. But each token whether hidden or visible still takes a fixed amount of compute.
We really really really need to disambiguate the LLM, which is a fixed length, fixed compute time process which takes in an input and produces a token distribution, from the AI system, which takes the output of the LLM and eventually produces something for the user.

In this case, all LLMs are fixed-length, but not all AI systems are. An LLM on its own is useless. Current SoTA research includes inserting 'pause' tokens. This is something that, when combined with an AI system that understands these, would enable variable time 'thinking'.

Yes. AIs come in all sorts of flavours.

I think the main thing that happened with LLMs was that people anthropomorphise them because they finally understand what's going on. Other AIs might be smarter by solving complicated mathematical problems but most people don't speak that language so they're not impressed.

LLM vendors should really make this clear but they don't because a magical thinking machine sells well.

> LLM vendors should really make this clear but they don't because a magical thinking machine sells well.

Hold on though... modern LLM systems, like ChatGPT 4o et al do stop and think. The vendors are not selling LLMs. LLMs are an implementation detail. They're selling AI systems: the LLM in addition to the controlling software.

Yes, you never tried it? I always get the same tokens/s from my local LLM setup no matter what I put in (and because it's local there are no hidden resources the cloud might have added to solve my extra-hard problem).

It does depend on the context + prompt length but for those the results are pretty static. It's clear to me that an LLM doesn't actually reason. Which is not something it's really been built to do so I'm not sure if it's a bad thing. The problem is more that people expect it to do that. Probably because it sounds so human so they ascribe human-like skills to it.

No, the processing time depends on the length of generated output.