Hacker News new | ask | show | jobs
by clooper 823 days ago
I was recently thinking how every neural network is equivalent to a lookup table where the input is all numbers up to what can be expressed within the context window and the output is the result of the arithmetic operations applied to that number. So every neural network is equivalent to T = {(i, f(i)) : i < K} where K is the constant which determines the context window and f is the numerical function implemented by the network. Can someone ask a neural network if my reasoning is valid and correct?

The main practical issue is the size of the table but I don't see any theoretical reasons why this is incorrect. The neural network is simply a compressed representation of the uncompressed lookup table. Given that the two representations are theoretically equivalent and a lookup table does not perform any reasoning we can conclude that no neural network is actually doing any thinking other than uncompressing the table and looking up the value corresponding to the input number.

Modern neural networks have some randomness but that doesn't change the table in any meaningful way because instead of the output being a number it becomes a distribution over some finite range which can again be turned into a table with some tuples.

11 comments

This reminds me of the classic problem in computation, where the simplest form of computation, the lookup table, input -> output, is limited to a finite domain. Turing modified the computation to have a finite internal state and infinite external environment (tape), so it becomes a transition function (state, stimulus) -> (new state, response), applied recursively in a feedback loop, allowing it to operate on infinite domains.

Famously a simple lookup table for the transition function then suffices to compute any computable function.

Have a look at Post's correspondence problem for even crazier universal models of computation. Or at Fractran.

Simplified for Post's correspondence problem, you have a set of playing cards with text written on the front and back. (You can make copies of cards in your set.)

The question is, can you arrange your cards in such a way, that they spell out the same total text on the front and back?

As an example your cards might be: [1] (a, baa), [2] (ab, aa), and [3] (bba, bb). One solution would be (3, 2, 3, 1) which spells out bbaabbbaa on both sides.

Figuring out whether a set of cards has a solution is Turing complete.

That's a good point.
It sounds like you're asking whether the output of a neural network is a deterministic function of its input. For many LLMs, you can make that answer yes with the right combination of parameters (temperature = 0) and underlying compute (variance in floating point calculations can still introduce randomness in model outputs even when the model should theoretically return the same answer every time).

There are some ways to introduce stochasticity:

1. Add randomness. The temperature or "creativity" hyperparameter in most LLMs does this, as do some decoders. The hardware these models run can also add randomness.

2. Add some concept of state. RNNs do this, some of the approaches which give the LLM a scratch pad or external memory do this, and continuous pre-training sort of does this.

How this affects people's perception of LLMs as thinking machines, I don't know. What if someone took every response I ever gave to every question that was ever asked of me in my life and made a Chinese Room[1] version of me? A lookup table that is functionally identical to my entire existence. In what contexts is the difference meaningful?

[1] https://en.wikipedia.org/wiki/Chinese_room

To your last point, https://en.m.wikipedia.org/wiki/Problem_of_induction

A LUT version of you is inductive. Every observed input/output pair does not uniquely identify your current state. Much like a puddle left by a melted ice cube indicates its volume, but little to nothing of its shape.

Post LUT-you genesis, applying property based fuzz testing would quickly reveal that the LUT-you is one of an infinite number of LUT-yous that melts into the puddle of historical data, but not the LUT-you that is the original ice cube.

https://fsharpforfunandprofit.com/posts/property-based-testi...

People can not be reduced to lookup tables even in theory. No one even knows how a single cell does what it does let alone an entire organism like a person.

I'm not making an abstract claim about neural networks because all numerical algorithms like neural networks can be reduced to a lookup table given a large enough hard drive. This is not practical because the space required would exceed the number of atoms in the known universe but the argument is sound. The same isn't true for people unless a person is idealized and abstracted into a sequence of numbers. I'm not saying no one is allowed to think of people as some sequence of numbers but this is clearly an abstraction of what it means to be a person and in the case of the neural network there is no abstraction, it really is a numerical function which can be expanded into a large table which represents its graph.

>People can not be reduced to lookup tables even in theory

Sure you can. Simply enumerate all of the physical states that the atoms in your body could be in. Any finite-sized object has a finite number of possible states, and so can be represented by a finite lookup table.

Your argument is so broad as to be meaningless.

Then give some concrete numbers for the states of the atoms. My argument is not abstract, it is very concrete. Give me a neural network and I can generate the graph and prove the equivalence between the network and its graph representation as a table of tuples.
You said "even in theory" which is obviously wrong, since the (local) universe is finite and deterministic, hence it is itself a giant lookup table.
> the (local) universe is finite and deterministic,

Radioactive decay and spontaneous pair production say otherwise on the deterministic front.

> the (local) universe is finite and deterministic

"It is not possible for the Universe being deterministic at any level. Only theories can be deterministic, practical reality is never"[0]

Q: Can you calculate your local universe's past states given its present state?

[0] https://philosophy.stackexchange.com/questions/99163/is-it-p...

Where are you going to get all that time and space to build a lookup table? Are you sure you're able to measure all state at enough precision to make an accurate table?
Simply not true. Of course it is comforting for computer people to believe the world they live in is a giant computer, but that is not our real reality.
"""the number of bits required to perfectly recreate the natural matter of the average-sized U.S. adult male human brain down to the quantum level on a computer is about 2.6×10^42 bits of information (see Bekenstein bound for the basis for this calculation).""" - https://en.wikipedia.org/wiki/Orders_of_magnitude_(data)

(That said, I think quantum physics makes it "all a Markov chain" rather than "all a lookup table").

You are making the assumption that your body consists of a static set of atoms, but your body is a living thing. Your lookup table would end up containing the entire universe to account for extremely remote possibilities.
those would just be inputs.
My "what if someone made a lookup table of everything I ever said in response to something else" hypothetical is pretty flimsy - I realized that right after writing it.

The point I wanted to make is that concepts of sentience, consciousness, reasoning, intelligence, etc. are very philosophically loaded ideas.

Responding to your comment, I don't think anyone credible is arguing that a human being is somehow the same as a neural network. I think the question at play here is "what constitutes reasoning?" - and more specifically "can a deterministic process reason?"

This is not a new debate at all - an abacus can tell us truths about the world, but we don't consider the abacus intelligent. Is GPT-4 somehow different, or is it a very large abacus?

As a numerical function it can be implemented on an abacus so I don't think it's any different from a large enough abacus. It's practically not feasible but theoretically there is no idealization or abstraction happening when numerical calculations on a computer are transferred to an abacus.
> People can not be reduced to lookup tables even in theory.

Yes they can, this is a direct corollary of the Bekenstein Bound.

Yes, this is one view of machine learning, the idea that you are training some function to map input to output, similar to "looking up" what output is addressed by some input.

And that's why the concept of generalization is so important on machine learning, and as a consequence, why the internal representation of that "lookup" matters.

By definition a lookup table can only store data it is given. However, the idea of ML systems is actually to predict values of inputs that are similar to but not given in their training data.

Interpolation and extrapolation, key components to applying ML systems to new data and therefore critical for actual usage, are enabled by internal representations that allow for modeling the space between and around data points. It so happens that multilayer neural networks accomplish this by general and smoothed (due to regularization tricks and inductive biases) iterative warpings of the representation (embedding) space.

Due to the manifold hypothesis, we can interpret this as determining underlying and semantically meaningful subspaces, and unfolding them to perform generalized operations such as logical manipulations and drawing classification boundaries in some relatively smooth semantic space, then refolding things to drive some output representation (pixels, classes, etc.)

Another view on this is that these manipulations allow a kind of compression by optimizing the representation to make manipulations easier, in other words they re-express the data in a form that allows algorithmic evaluation of some input program. This gives the chance of modeling intrinsic relationships such as infinite sequences as vector programs. (Here I mean things like mathematical recursions, etc.) When this is accomplished, and it happens due to the pressure to optimally compress data, you could say that "understanding" emerges, and the result is a program that extrapolates to unseen values of such sequences. At this point you could say that while the input-output relationship is like a lookup table, functionally it is not the same thing because the need to compress these input-output relationships has led to some representation which allows for extrapolation, aka "intelligence" by some definitions.

The fact that these systems are still very dumb sometimes is simply due to not developing these representations as well as we would like them to, for a variety of reasons. But theoretically this is the idea behind why emergence might occur in an NN but not in a lookup table.

Take a relatively simple large language model like Llama 1. It has a context of 2048 tokens and each token can be one of 32,000 values. So the lookup table would need 32,000^2048 entries. That's not just impractically large, that's larger than cosmically large. There are only estimated to be about 10^80 atoms in the visible universe. So while a 32,000^2048 lookup table might be a valid concept mathematically, it's not anything you can intuit physically, and therefore not something you can say is incapable of reason.
Every program is a compressed representation of its output. This is from Kolmogorov complexity, which you learn this in any CS complexity theory course.

So, a neural network being a compressor/decompressor is nothing special.

Note, however, that supposing a context window of 1000 units, then we are looking at K = 2^1000 = 10^300 different entries in the truth table. Somehow, your LLM neural network is the result of compressing a 10^300 exponential scale amount of possible information, which of course could never be seen at all -- to compress a JPEG at least you have access to the original image, not just two pixels in it.

Anyways, the philosophical debate is whether you believe programs can think, whether machine intelligence is meaningful at all by definition. Some say yes, others say no. When humans think, are not our abstractions and ideas a kind of compression?

This is an old argument against determinism - I think a serious challenge is that:

1. Modern physics suggests you can implement such a lookup table for any subset of our universe.

2. We are a subset of the universe.

3. Therefore we are representable by lookup tables too.

...so your argument appears to prove too much, namely that humans aren't thinking beings either. Which is fine, but personally I don't think that's a useful definition of "thinking".

We're not a lookup table of the things we're, eg., saying, or doing etc. Nor are we looking up, in this sense, when we act.

ie., when you compress text into an NN and use it to generate text, the generated text is just a synthesis of the compressed text.

Whereas when I type, I am not synthesising text. Rather I have the skill of typing, I have an interior subjectivity of thoughts, I have memories which arent text, and so on.

When my fingers move across the keyboard it isn't because they are looking up text.

Our causal properties (experiencing, thinking, seeing, feeling, remembering, moving, speaking, growing, digesting ...) are not each, "index on the total history of prior experience", "index on the total history of prior seeing". The world directly causes, eg., us to see -- seeing isnt a lookup table of prior seeings.

( Also, the whole of physics is formulated in terms that cannot be made into a lookup table; and there is no evidence, only insistence, of the converse. )

I strongly disagree with your last statement - physics explicitly _is_ formulated in terms that can be made into a lookup table (see phase spaces in classical mechanics, for instance).

My point is that there's a finite light cone of possible causal influences over you at any moment in time, and in principle you can break those down into state variables finely enough to predict future states of a person. This is isomorphic to a lookup table, albeit one we aren't able to construct right now.

Im not suggesting it's enough to consider just the person in this scenario - the causal factors are part of the lookup.

How do you lookup quantum mechanics? Please tell the physicists about your breakthroughs.
No need, physicists already do this all the time - any computer simulation of quantum mechanical systems has to come to terms with the same problems (namely quantising the state space and representing the dynamics deterministically).
Physicists simulate on computers only what can be, which is almost nothing. Consider obtaining the dynamics of water by simulating all its parts: proton flow, hydrogen bonding etc. of 10^{PHYSICALLY UNCOMPUTABLE} interactions.

The simulations which do exist fail to model vast amounts. This is why, say, climate change is given as a prediction on temperature -- because it can be obtained as a mean which ignores "basically everything".

And it can be easily show that the assumptions of QM are false if Hilbert space is computable (QM becomes non-linear); and of classical mechanics (which becomes non-deterministic); and so on. ie., that the issue isnt merely 10^{PHYSICALLY UNCOMPUTABLE} but that non-computable functions are essential to the formulation.

The assertion that the world is computable is just that: there are no research projects, no textbooks, no experiments, no formalism to replace physics or anything like it -- nothing. All the basic assumptions of physics would have to be false, and we would have to have good reasons for supposing so.

This is just nonsense. The world is geometrical as described by physics. It is not computational as described by the discrete mathematician whose megalomania and platonism knows no bounds.

"I am not synthesising text"

Then what are you doing?

I think you are falling for the same arguments as 'mystics'. Somehow your inner thoughts are un-explainable. But nothing in your argument explains it, you are just taking your own inner experience itself as the mystical explanation.

The old 'I think therefore I am' argument.

And where did the 'thinking' come from?

It's entirely explainable, I just explained it in the other comment, "Somatosenstory representations are built by the sensory-motor system."

Do you really think the meat of my body is growing so as to record every symbol i've seen, or even an induction across them?

I find this inability to think outside of the switching frequencies of a silicon chip as they model the patterns of text tokens on reddit, absolutely bizarre.

You're an ape, have some appreciation for it. You're much more interesting than what openai can steal from amazon's ebook library.

My view is opposite.

"inability to think outside of the switching frequencies of a silicon chip"

Why can't you "think outside", and conceptualize that this 'internal subjective experience' we are having, is not unique, and could be taking place inside a silicon based NN?

It helps to meditate and observe your own thoughts, and how they arise. You can begin to realize that you don't 'think'. You don't think about what to think and thus think it. Thoughts just become un-bidden.

Its back to the Schopenhauer quote "“A man can do as he will, but not will as he will.”

Then, when you begin to notice the mechanistic nature of your own mind, you will have less resistance to how 'silicon' is also reacting. Silicon and Carbon both reacting to inputs, processing.

Organisms are mechanistic, there's nothing "not mechanistic" about the mind.

Its insane to suppose that somatosensory representation building, which requires organic neruoplasticity to be connected to the organically adaptive neuromotor system, etc. etc. etc. can just be instantiated in a bit of sand.

This is deeply mystical, pseudoscience.

You're credulously throwing away any kind of empirical analysis of the world in terms of it's properties and their mechanisms for the deeply mystical view that, unique in amongst all properties of the world, consciousness needs no empirical analysis of the properties of the systems which have it.

Of gold we ask: what makes it shine; of fire: what makes it hot.. and so on for everything empirical in the world.

But with the mind we must stop! No! No! do not do any science! please that might mean we can't be scammed by a VC out of our investment money; i cannot babble endlessly about scifi star treck episodes! no no! please do not rob me of my scifi religion! please please, do not ruin commander data for me!

Well there is no commander data. And the properties of gold are not those of sand, nor those of animals. And just as no bit of silicon will be trasumted into gold by the running of an NN on its electric field; likewise, no bit of silicon will desire or wish or conceive of anything.

We are biological organisms; we do not have souls, even if in your religion, the soul is "a pattern". Your consciousness will not be uploaded; commander data will never visit; and your local VC shyster is on a stock manipulation grift to bamboozle you out of money.

Typing is just a medium, it is irrelevant. Seeing and all the other senses that you mentioned are input within a context window.
uhuh.. and how do you form the inputs into that context window?

Turns out you need to move (indeed, adapt) the body in order to form the very techinques which become concepts that can be given as inputs.

The eye does not move on its own, it has to be directed to attend to reality as conceptualised -- where do these come from? Somatosenstory representations are built by the sensory-motor system.

Or, simply: in order to first think, we move.

How are people lookup tables? In the case of neural networks the representation of the table is obvious, it's just numbers. What would be the equivalent table for the liver?

My argument isn't abstract. Neural networks really are just numerical functions which can be expanded into their equivalent graph representations.

Not sure what he's referring to in terms of modern physics saying we're just a lookup table but at the very least, you could say the same thing about the conversation that we're having now. You read words, those words map to meaning representation in our heads, we then generate a response.
Obviously if we are interacting over a digital medium then the responses will be encoded as numbers but there is no way to reduce an entire person to a lookup table. Measured output of human behavior can be expressed as lists of numbers but thinking is not the same as the list of numbers, unlike in the case of neural networks where the graph and the network are actually equivalent.
You could represent all the input on different levels as numbers, e.g. all EM waves hitting our eyes, then all the physical output from our body also as numbers, and everything that causes this output from input within is what you would consider to be a lookup table.
What are the dimension of the input and output spaces involved in this idealization? In the case of a neural network there is no idealization. The network is software, it's a number. It's inputs and outputs are all bounded and can be expressed as a table of bounded tuples.
People really are just stacks of molecules that can be broken down into their causal properties - moreover, we know those causal properties to a high degree of accuracy these days.

I'm suggesting that for any given human/environment pair, there is a lookup table that produces that person's actual behaviour in that situation. Modern physics lets us approximate this lookup table, and presumably better physics would give us a better lookup table.

Since human behaviour can in principle be described with a lookup table, I see this as a bad reason to rule out a system as "thinking".

Perhaps there is another way to describe neural nets, one that does not use the language of lookup tables, that makes it feel more like thinking and less like lookups.

One such approach I've seen is looking for embedded world models in neural nets.

Suppose that we used embeddings as the input of the model rather than piece identifiers plus an embedding lookup table. This is possible with every transformer model and some libraries provide an API to do this. Moreover, we convert the parameters and ops to use arbitrary precision types. Then the network cannot be represented as a lookup table. Given that there is an infinite number of inputs, there is also an infinite number of outputs. But the arbitrary-precision network does not operate fundamentally different from the original network. It has the same parameters, ops, etc., yet you cannot store it as a (finite) lookup table.
Even if you increase the precision I can still generate a table T(P) for each fixed precision P. So the table is parametrized by P but it's still a table. The entire table T = colim T(P) is the colimit over all precision values but for every finite precision it is still a table.
I did not say fixed precision. I said arbitrary precision, so P is infinite.

The only counter-argument is that even arbitrary precision is fixed-precision because computer memory is finite. But that's kind of a silly argument, because then you are arguing that computers can never reason, because they have finite memory, and moreover humans cannot reason either, because there is a finite number of brain cells.

P obviously can't be infinite, even in theory, if you want the computation to terminate.
Right, but then as others said, then you are also arguing that humans cannot reason, since the universe is a system with a finite number of particles. Or if we exclude external factors, because humans have a finite number of brain cells.

In the end it all depends on what your definition of reasoning is, which you did not provide.

The bit precision of computation is always finite for halting computations and any finite computation can be turned into a lookup table which does no thinking or reasoning other than comparing two numbers and then extracting the value corresponding to the input key.

My argument carries through for any piece of software so if you think software can think and reason then you can remain unconvinced by my argument.

In any case, I have to drop out of this thread.

Just like to point out that RNNs have internal state which isn't captured in this view, so yes, lots of NNs can be considered this way, but not all. It's the DSP equivalent of FIRs vs IIRs.
This whole thread on lookup tables seems to be confused.

Isn't this purely math, the equivalence of a function to a lookup table is well studied. And NN as comprised of functions, can be boiled down to table as posted.

How do we get from this math concept of function=table, and get to arguments about consciousness and free-will and state space of the universe...

The table-NN equivalence doesn't seem to help peoples understanding of NN.

People are just outright abusing the terminology. OP's argument would also conclude that a sorting algorithm is not a "real" algorithm because it too can be done by an infinite lookup table.

That said, the general debate is a valid one. Are LLMs just doing fancy statistical compression of data, or are they doing "reasoning" in some important sense, be that merely mechanistic logical reasoning, or "human-level intelligent reasoning"?

For that matter, did the paper authors ever define "Reasoners" in their title, or leave it to the reader?

Sure. I agree.

The debate is good. I just don't see how the 'table-lookup' analogy is helping.

Except maybe by helping people see the non-free-will nature of the universe. But seems like people that reject this, are also ones rejecting the 'table-function' equivalence.

> thinking how every neural network is equivalent to a lookup table where the input is all numbers up to what can be expressed within the context window and the output is the result of the arithmetic operations applied to that number... no neural network is actually doing any thinking other than uncompressing the table and looking up the value corresponding to the input number

You're proposing the lookup table as one possible mechanism in Searle's chinese room, then proposing Searle's conclusion?

“Searle argues that, without ‘understanding’ (or ‘intentionality’), we cannot describe what the machine is doing as ‘thinking’ and, since it does not think, it does not have a ‘mind’ in anything like the normal sense of the word. Therefore, he concludes that the ‘strong AI’ hypothesis is false.‘

https://en.wikipedia.org/wiki/Chinese_room

I think you've said Chinese room, run as many times as it takes to get all possible sequences of Chinese characters to cache the results, then using those run it and ask if it's still or yet ‘thinking’.

PS. Where did the arithmetic operations come from? How did they come to be as they are? Is iterating to an algo that does that, ‘learning’? What's the difference between this and lossy or non-lossy compression of information? Could it be said the arithmetic operations are a compression of the lookup table into that which has the ‘right’ response given the inputs? If two different sets of arithmetic operations give by and large the same outputs from inputs, is one of them more ‘reasoning’ than the other depending how it's derived? What do we mean by ‘learning’ and ‘reasoning’ when applying those words to humans? Are teachers telling students to ‘show your work’ searching for explainable intelligence? :-)

Here's a counterexample. Suppose I create a simple neural network that computes f(x) = x^2 + c (where x and c are complex numbers) and then I run it as an RNN. This RNN will compute the mandelbrot set, which can't be represented by a lookup table.

You can't even know if the RNN will halt for a given input. Neural networks are stronger than lookup tables, they are programs.

Every computable function can be represented by a (possibly infinite)¹ lookup table.

Computer programs can only compute computable functions. Therefore any computer program is (in theory) equivalent to a table lookup.

¹ For finite inputs, the lookup table can be finite, and for infinite inputs, the lookup table can be infinite but still countable, as the set of computable functions is countable.

This table is not computable. If you had this table, you could solve the halting problem by simply looking up whether the program produced an output.
You've just restated the halting problem.

Nobody claimed that there is an algorithm to translate arbitrary programs into an equivalent lookup table. (Because that's the exact same proposition as stating that there is a program that can compute whether an arbitrary program halts when executed).

The point is: Any specific program can be translated into a lookup table. Computer programs and lookup tables are equivalent!

You claimed that computer programs are somehow "more powerful" than lookup tables. That's just plain wrong. They're exactly equivalent in "power".

I am sorry to be this blunt but this is really utter and complete nonsense. The phrase that the mandelbrot set can't be represented in a lookup table is as such true but that is because nothing that you do with finite precision numbers can represent the mandelbrot set because it essentially is an inifinte object. The function f(x) = x^2 + c as an RNN can also not compute the mandelbrot set if the numbers it uses are of finite precision. That is exactly the same limitation that the lookup table also faces so there is no fundamental difference between the two.
We can give them both infinite precision, you still can't build a lookup table of the mandelbrot set.

The mandelbrot set is essentially a map of the halting behavior of a specific program. You can't know whether or not the program will halt for a given input, and so cannot build the lookup table. Programs are stronger than input-output mappings.

Infinity is really hard to reason about, are you sure about that?

(For all I know you're a PhD in transfinites, your profile says nothing).

Infinity (of the various kinds) is well understood (see Cantor etc).

The Halting Problem is a central result in computer science, again well understood (especially here I would think!)

Their comment is correct.

I see you are a fan of flying disembodied brains, but this time without a universe surrounding the brain.