| HN Mirror

>No amount of training would cause a fly brain to be able to do what an octopus or bird brain can, or to model their behavioral generating process.

Go back a few evolutionary steps and sure you can. Most ANN architectures basically have relatively little to no biases baked in and the Transformer might be the most blank slate we've built yet.

>No amount of training will cause a transformer to magically sprout feedback paths or internal memory, or an ability to alter it's own weights, etc.

A transformer can perform any computation it likes in a forward pass and you can arbitrarily increase inference compute time with the token length. Feedback paths? Sure. Compute inefficient? Perhaps. Some extra programming around the Model to facilitate this ? Maybe but the architecture certainly isn't stopping you.

Even if it couldn't, limited =/ trivial. The Human Brain is not Turing complete.

Internal Memory ? Did you miss the memo ? Recurrency is overrated. Attention is all you need.

That said, there are already state keeping language model architectures around.

Altering weights ? Can a transformer continuously train ? Sure. It's not really compute efficient but architecture certainly doesn't prohibit it.

>Architecture matters

Compute Efficiency? Sure. What it is capable of learning? Not so much

> A transformer can perform any computation it likes in a forward pass

No it can't.

A transformer has a fixed number of layers - call it N. It performs N sequential steps of computation to derive it's output.

If a computation requires > N steps, then a transformer most certainly can not perform it in a forward pass.

FYI, "attention is all you need" has the implicit context of "if all you want to build is a language model". Attention is not all you need if what you actually want to build is a cognitive architecture.

https://arxiv.org/abs/2310.02226

Transformer produce the next token by manipulating K hidden vectors per layer, one vector per preceding token. So yes you can increase compute length arbitrarily by increasing tokens. Those tokens don't have to carry any information to work.

And again, human brains are clearly limited in the number of steps it can compute without writing something down. Limited =/ Trivial

>FYI, "attention is all you need" has the implicit context of "if all you want to build is a language model".

Great. Do you know what a "language model" is capable of in the limit ? No

These top research labs aren't only working on Transformers as they currently exist but it doesn't make much sense to abandon a golden goose before it has hit a wall.

> And again, human brains are clearly limited in the number of steps it can compute without writing something down

No - there is a loop between the cortex and thalamus, feeding the outputs of the cortex back in as inputs. Our brain can iterate for as long as it likes before initiating any motor output, if any, such as writing something down.

The brain's ability to iterate on information is still constrained by certain cognitive limitations like working memory capacity and attention span.

In practice, the cortex-thalamus loop allows for some degree of internal iteration, but the brain cannot endlessly iterate without some form of external aid (e.g., writing something down) to offload information and prevent cognitive overload.

I'm not telling you anything here you don't experience in your everyday life. Try indefinitely iterating on any computation you like and see how well that works for you.

What's your point?

The discussion is about the architecturally imposed limitations of LLMs, resulting in capabilities that are way less than that of a brain.

The fact that the brain has it's own limits doesn't somehow negate this fact!

accountnum 655 days ago

You seem to repeatedly insist that hidden computation is a distinction of any relevance whatsoever.

First of all, your understanding of the architecture itself is mistaken. A transformer can iterate endlessly because each token it produces allows it a forward pass, and each of these tokens is postpended to its input in the next inference. That's the autoregressive in autoregressive transformer, and the entire reason why it was proposed for arbitrary seq2seq transduction.

This means you get layers * tokens iterations, where tokens is up to two million, and is in practice unlimited due to the LLM being able to summarize and select from that. Parallelism is irrelevant, since the transformer is sequential in the output of tokens. A transformer can iterate endlessly, it simply has to output enough tokens.

And no, the throughput isn't limited either, since each token gets translated into a high-dimensional internal representation, that in turn is influenced by each other token in the model input. Models can encode whatever they want not just by choosing a token, but by choosing an arbitrary pattern of tokens encoding arbitrary latent-space interactions.

Secondly, internal thoughts are irrelevant, because something being "internal" is an arbitrary distinction without impact. If I trained an LLM to prepend and postpend <internal_thought> to some part of its output, and then simply didn't show that part, then the LLM wouldn't magically become human. This is something many models do even today, in fact.

Similarly, if I were to take a human and modify their brain to only be able to iterate using pen and paper, or by speaking out loud, then I wouldn't magically make them into something non-human. And I would definitely not reduce their capacity for reasoning in any way whatsoever. There are people with aphantasia working in the arts, there are people without an internal monologue working as authors - how "internal" something is can be trivially changed with no influence on either the architecture or the capabilities of that architecture.

Reasoning itself isn't some unified process, neither is it infinite iteration. It requires specific understanding about the domain being reasoned over, especially understanding of which transformation rules are applicable to produce desired states, where the judgement about which states are desirable has to be learned itself. LLMs can reason today, they're just not as good at it than humans are in some domains.

HarHarVeryFunny 655 days ago

Sure - a transformer can iterate endlessly by generating tokens, but this is no substitute for iterating internally and maintaining internal context and goal-based attention.

One reason why just blathering on endlessly isn't the same as thinking deeply before answering, is that it's almost impossible to maintain long-term context/attention. Try it. "Think step by step" or other attempts to prompt the model into generating a longer reply that builds upon itself, will only get you so far because keeping a 1-dimensional context is no substitute for the thousands of connections we have in our brain between neurons, and the richness of context we're therefore able to maintain while thinking.

The reasoning weakness of LLMs isn't limited to "some domains" that they had less training data for - it's a fundamental architecturally-based limitation. This becomes obvious when you see the failure modes of simple problems like "how few trips does the farmer need to cross the river with his chicken & corn, etc" type problems. You don't need to morph the problem to require out-of-distribution knowledge to get it to fail - small changes to the problem statement can make the model state that crossing the river backwards and forwards multiple times without loading/unloading anything is the optimal way to cross the river.

But, hey, no need to believe me, some random internet dude. People like Demis Hassabis (CEO of DeepMind) acknowledge the weakness too.

You are confusing number of sequential steps with total amount of compute spent.

The input sequence is processed in parallel, regardless of length, so number of tokens has no impact on number of sequential compute steps which is always N=layers.

> Do you know what a "language model" is capable of in the limit ?

Well, yeah, if the language model is an N-layer transformer ...

Fair Enough.

Then increase N (N is almost always increased when a model is scaled up) and train or write things down and continue.

A limitless iteration machine (without external aid) is currently an idea of fiction. Brains can't do it so I'm not particularly worried if machines can't either.

Increasing number of layers isn't a smart way to solve it. It order to be able to reason effectively and efficiently the model needs to use as much, or as little, compute as needed for a given task. Completing "1+1=" should take less compute steps than "A winning sequence for white here is ...".

This lack of "variable compute" is a widely recognized shortcoming of transformer-based LLMs, and there are plenty of others. The point apropos this thread is that you can't just train an LLM to be something that it is not. If the generating process required variable compute (maybe 1000's of steps) - e.g. to come up with a chess move - then no amount of training can make the LLM converge to model this generative process... the best it can do is to model the outcome of the generative process, not the process itself. The difference is that without having learnt the generative process, the model will fail when presented with a novel input that it didn't see during training, and therefore didn't memorize the "cheat sheet" answer for.

machiaweliczny 656 days ago

How about spiders intelligence? They don’t even have brain