Hacker News new | ask | show | jobs
by chewxy 945 days ago
TL;DR: LLMs are autocomplete on steroids, but stranger-than-expected steroids means people overhype it.

I'll share my point of view, and leave you to your own conclusions.

1. A sequence learner is anything that can learn a continuation of sequences (words, letters etc). You may call this "autocomplete".

2. Sequence learners can predict based on statistics (invented by none other than Shannon!), or by some machine learning process.

3. The most popular sequence learner nowadays are LLMs, which are neural networks with attention mechanisms.

4. Neural networks are basically linear algebra expressions: Y = σ(W'x + b). A fun thing is that this basic expression can approximate any other function (that are lipschitz and not kolmogorov arnold).

5. Aforementioned attention mechanisms pay attention to input as well as activations activations within the neural network (you can think of it as representations of the neural network's knowledge)

6. LLMs are stupidly large. They have excess computation capacity.

7. Due to training procedures, these excess computation capacity may spontaneously organize to form a virtual neural network with gradient descent in their forward pass (this sentence is a rough approximation of what really happens).

8. This shows the phenomenon of "in-context learning", which people are strangely very excited about. This is because of the hypothesis that an LLM with in-context learning may also use (i.e. pay attention to) its internal knowledge representation (i.e. its activations).

9. This in-context learning phenomenon relies primarily on the next-token prediction capability. Remove that next-token prediction, and the entire scheme falls apart.

From this list of premises, my view is that LLMs are autocomplete on very strange steroids with computational side effects (e.g. in-context learning, which only arises if you do training in a particular way). It has no mind, no concrete understanding of knowledge. It is highly unreliable.

1 comments

0. Ahead of the pattern recognition, is a set of layers (with an intentional bottleneck in the middle) that have taken a ton of tokens in small random chunks, and have been trained to reproduce the input, despite the bottleneck. This network is an autoencoder[1]. In my opinion, they are almost magic, and it's amazing to me that they work at all.

The Autoencoder is then split into an encoder and a decoder, so that tokens going in can be converted to a "embedding" (the values passed through the bottleneck).

It's that layer that does the grunt work of making similar words near to each other in the encoded values.

re #4. Neural networks are multiple layers of matrix multiplies with biases, and a non-linear output on each layer. The nonlinear part is important, otherwise you could just do the algebra and collapse all the layers down to one matrix multiply.

The autoencoder is what makes the autocomplete on steroids actually useful.

[1] https://en.wikipedia.org/wiki/Autoencoder