| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wkat4242 620 days ago
	One of the things that kinda illustrate this for me, is that an LLM always uses the same time to process a prompt of the same length. No matter how complicated the problem is. Obviously the complexity of the problem is not actually taken into account.

3 comments

acchow 620 days ago

This is only true if the output is the same length (which should be exceptionally rare if the input text is different).

link

wkat4242 620 days ago

That's true, I was talking about tokens/sec output but I should have specified.

link

obmelvin 620 days ago

the o1 model definitely has a somewhat big variance in how long the task takes depending on what you ask it to do

link

wkat4242 620 days ago

True the o1 model is the one exception though it's really more of a chain of LLMs. I wouldn't consider it a pure LLM.

Also, o1 still fails at many mathematical tasks which the linked article clarifies.

link

robterrell 620 days ago

You don't see the majority of tokens it is generating.

link

obmelvin 620 days ago

Yes, I'm not claiming that is true formal reasoning, but it is certainly more of a chain of thought than was previously being done and does indicate that some questions require more and less "thought"

link

lifeisstillgood 620 days ago

Wait what ? Is that real?

link

fxtentacle 620 days ago

Yes. In the end, LLMs are a sequence of matrix multiplications and since they don't loop internally, every output token gets the same number of internal processing steps, no matter what the input is. Only the input length is relevant because some steps can be skipped if the input buffer is not full.

link

modeless 620 days ago

Yes. OpenAI's o1 model is an attempt to address this, by letting the model choose to "think" by generating hidden tokens for a variable amount of time before producing the visible output tokens. But each token whether hidden or visible still takes a fixed amount of compute.

link

anon291 620 days ago

We really really really need to disambiguate the LLM, which is a fixed length, fixed compute time process which takes in an input and produces a token distribution, from the AI system, which takes the output of the LLM and eventually produces something for the user.

In this case, all LLMs are fixed-length, but not all AI systems are. An LLM on its own is useless. Current SoTA research includes inserting 'pause' tokens. This is something that, when combined with an AI system that understands these, would enable variable time 'thinking'.

link

wkat4242 620 days ago

Yes. AIs come in all sorts of flavours.

I think the main thing that happened with LLMs was that people anthropomorphise them because they finally understand what's going on. Other AIs might be smarter by solving complicated mathematical problems but most people don't speak that language so they're not impressed.

LLM vendors should really make this clear but they don't because a magical thinking machine sells well.

link

anon291 620 days ago

> LLM vendors should really make this clear but they don't because a magical thinking machine sells well.

Hold on though... modern LLM systems, like ChatGPT 4o et al do stop and think. The vendors are not selling LLMs. LLMs are an implementation detail. They're selling AI systems: the LLM in addition to the controlling software.

link

wkat4242 620 days ago

Yes, you never tried it? I always get the same tokens/s from my local LLM setup no matter what I put in (and because it's local there are no hidden resources the cloud might have added to solve my extra-hard problem).

It does depend on the context + prompt length but for those the results are pretty static. It's clear to me that an LLM doesn't actually reason. Which is not something it's really been built to do so I'm not sure if it's a bad thing. The problem is more that people expect it to do that. Probably because it sounds so human so they ascribe human-like skills to it.

link

lostmsu 620 days ago

No, the processing time depends on the length of generated output.

link