| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by piotr93 594 days ago
	The only computations that an LLM does are backprops and forward passes. It can not run any arbitrary program description. Yes, it will hallucinate your program's output if you feed it some good enough starting prompt. But that's it.

2 comments

logicchains 594 days ago

An LLM with chain of thought and unbounded compute/context can run any program in PTIME: https://arxiv.org/abs/2310.07923 , which is a huge class of programs.

link

Vetch 594 days ago

Note that this is an expressibility (upper) bound on transformers granted intermediate decoding steps. It says nothing about their learnability, and modern LLMs are not near that level of expressive capacity.

The authors also introduce projected pre-norm and layer-norm hash to facilitate their proofs, another sense in which it is an upper-bound on the current approach to AI, since these concepts are not standard. Nonetheless, the paper shows how allowing a number of intermediate decoding steps polynomial in input size is already enough to run most programs of interest (which are in P).

There are additional issues. This work relies on the concept of saturated attention, however as context length grows in real world transformers, self-attention deviates from this model as it becomes noisier, with unimportant indices getting undue focus (IIUC, due to precision issues and how softmax assigns non-zero probability to every token). Finally, it's worth noting that the more under-specified your problem is, and the more complex the problem representation is, then the quickly more intractable the induced probabilistic inference problem. Unless you're explicitly (and wastefully) programming a simulated turing machine through the LLM, this will be far from real-time interactive. Users should expect a prolog like experience of spending most of their time working out how to help search.

Trivia: Softmax also introduces another problem: the way softmax is applied forces attention to always assign importance to some tokens, often leading to dumping of focus on typically semantically unimportant tokens like whitespace. This can lead to an overemphasis on unimportant tokens, possibly inducing spurious correlations on whitespace, this propagating through the network with possibly unexpected negative downstream effects around whitespace.

link

csmpltn 594 days ago

> "An LLM with unbounded compute/context"

This isn't a thing we have, or will have.

It's like saying that a computer with infinite memory, CPU and power can certainly break SHA-256 and bring the world's economy down with it.

link

stavros 594 days ago

No, it's like saying "a computer can't crack SHA hashes, it can only add and subtract numbers together" "a computer can crack any SHA hash" "yes, given infinite time".

The fact that you need infinite time for some of the stuff doesn't mean you can't do any of the stuff.

link

hansonkd 594 days ago

I mean it doesn't need to compute all programs in a human length reasonable amount of time.

It just needs to be able to compute enough programs to be useful.

Even our current infrastructure of precisely defined programs and compilers isn't able to compute all programs.

It seems reasonable in the future be able to give an LLM the python language specification, a python program, and it iteratively returns the answer.

link

vrighter 593 days ago

If it's executing a program, then the easiest way to make it more efficient, is to ditch the LLM and just execute the program. The LLM in this case is basically only (very very very very very inefficiently) approximating the very CPU it's running on. Just use the CPU to execute the program! And you won't be running it on an approximated processor, you'll be running it on a deterministic, reliable one that will not give the wrong answer ever (given the right program and correct input, of course, and assuming no hardware failures, which would affect LLMs too)

link

vrighter 593 days ago

They're a neural network like any other. They are universal function approximators. Doesn't mean that approximating a function that executes a particular program doesn't need an intractably large neural network.

link

piotr93 594 days ago

Woah super interesting, I didn't know about this. Will def read it! Seems like I was wrong?

link

FezzikTheGiant 594 days ago

genuinely what's the point of this comment? are you allergic to cool stuff? honestly curious as to what you were trying to achieve by this comment.

nowhere in this post does the author say that it's ready with the current state of models, or he'd use a foundation model for this. why the hate?

link