| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by colah3 399 days ago

Since this post is based on my 2014 blog post (https://colah.github.io/posts/2014-03-NN-Manifolds-Topology/ ), I thought I might comment.

I tried really hard to use topology as a way to understand neural networks, for example in these follow ups:

- https://colah.github.io/posts/2014-10-Visualizing-MNIST/

- https://colah.github.io/posts/2015-01-Visualizing-Representa...

There are places I've found the topological perspective useful, but after a decade of grappling with trying to understand what goes on inside neural networks, I just haven't gotten that much traction out of it.

I've had a lot more success with:

* The linear representation hypothesis - The idea that "concepts" (features) correspond to directions in neural networks.

* The idea of circuits - networks of such connected concepts.

Some selected related writing:

- https://distill.pub/2020/circuits/zoom-in/

- https://transformer-circuits.pub/2022/mech-interp-essay/inde...

- https://transformer-circuits.pub/2025/attribution-graphs/bio...

10 comments

montebicyclelo 399 days ago

Related to ways of understanding neural networks, I've seen these views expressed a lot, which to me seem like misconceptions:

- LLMs are basically just slightly better `n-gram` models

- The idea of "just" predicting the next token, as if next-token-prediction implies a model must be dumb

(I wonder if this [1] popular response to Karpathy's RNN [2] post is partly to blame for people equating language neural nets with n-gram models. The stochastic parrot paper [3] also somewhat equates LLMs and n-gram models, e.g. "although she primarily had n-gram models in mind, the conclusions remain apt and relevant". I guess there was a time where they were more equivalent, before the nets got really really good)

[1] https://nbviewer.org/gist/yoavg/d76121dfde2618422139

[2] https://karpathy.github.io/2015/05/21/rnn-effectiveness/

[3] https://dl.acm.org/doi/pdf/10.1145/3442188.3445922

colah3 399 days ago

I guess I'll plug my hobby horse:

The whole discourse of "stochastic parrots" and "do models understand" and so on is deeply unhealthy because it should be scientific questions about mechanism, and people don't have a vocabulary for discussing the range of mechanisms which might exist inside a neural network. So instead we have lots of arguments where people project meaning onto very fuzzy ideas and the argument doesn't ground out to scientific, empirical claims.

Our recent paper reverse engineers the computation neural networks use to answer in a number of interesting cases (https://transformer-circuits.pub/2025/attribution-graphs/bio... ). We find computation that one might informally describe as "multi-step inference", "planning", and so on. I think it's maybe clarifying for this, because it grounds out to very specific empirical claims about mechanism (which we test by intervention experiments).

Of course, one can disagree with the informal language we use. I'm happy for people to use whatever language they want! I think in an ideal world, we'd move more towards talking about concrete mechanism, and we need to develop ways to talk about these informally.

There was previous discussion of our paper here: https://news.ycombinator.com/item?id=43505748

HarHarVeryFunny 398 days ago

1) Isn't it unavoidable that a transformer - a sequential multi-layer architecture - is doing multi-step inference ?!

2) There are two aspects to a rhyming poem:

a) It is a poem, so must have a fairly high degree of thematic coherence

b) It rhymes, so must have end-of-line rhyming words

It seems that to learn to predict (hence generate) a rhyming poem, both of these requirements (theme/story continuation+rhyming) would need to be predicted ("planned") at least by the beginning of the line, since they are inter-related.

In contrast, a genre like freestyle rap may also rhyme, but flow is what matters and thematic coherence and rhyming may suffer as a result. In learning to predict (hence generate) freestyle, an LLM might therefore be expected to learn that genre-specific improv is what to expect, and that rhyming is of secondary importance, so one might expect less rhyme-based prediction ("planning") at the start of each bar (line).

lo_zamoyski 398 days ago

> The whole discourse of "stochastic parrots" and "do models understand" and so on is deeply unhealthy [...] So instead we have lots of arguments where people project meaning onto very fuzzy ideas and the argument doesn't ground out to scientific, empirical claims.

I would put it this way: the question "do LLMs, etc understand?" is rooted in a category mistake.

Meaning, I am not claiming that it is premature to answer such questions because we lack a sufficient grasp of neutral networks. I am asserting that LLMs don't understand, because the question of whether they do is like asking whether A-flat is yellow.

somewhereoutth 398 days ago

Regardless of the mechanism, the foundational 'conceit' of LLMs is that by dumping enough syntax (and only syntax) into a sufficiently complex system, the semantics can be induced to emerge.

Quite a stretch, in my opinion (cf. Plato's Cave).

visarga 398 days ago

> Regardless of the mechanism, the foundational 'conceit' of LLMs is that by dumping enough syntax (and only syntax) into a sufficiently complex system, the semantics can be induced to emerge.

Syntax has dual aspect. It is both content and behavior (code and execution, or data and rules, form and dynamics). This means syntax as behavior can process syntax as data. And this is exactly how neural net training works. Syntax as execution (the model weights and algorithm) processes syntax as data (activations and gradients). In the forward pass the model processes data, producing outputs. In the backward pass it is the weights of the model that become the data to be processed.

When such a self-generative syntactic system is in contact with an environment, in our case the training set, it can encode semantics. Inside the model data is relationally encoded in the latent space. Any new input stands in relation to all past inputs. So data creates its own semantic space with no direct access to the thing in itself. The meaning of a data point is how it stands in relation to all other data points.

Another important aspect is that this process is recursive. A recursive process can't be fully understood from outside. Godel, Turing, Chaitin prove that recursion produces blindspots, that you need to walk the recursive path to know it, you have to be it to know it. Training and inferencing models is such a process

The water carves its banks

The banks channel the water

Which is the true river?

Here, banks = model weights and water = language

Nevermark 398 days ago

Anyone who has widely read topics across philosophy, science (physics, biology), economics, politics (policy, power), from practitioners, from original takes, news, etc. ... has managed to understand a tremendous number of relationships due to just words and their syntax.

While many of these relationships are related to things we see and do in trivial ways, the vast majority go far beyond anything that can be seen or felt.

What does economics look like? I don't know, but I know as I puzzle out optimums, or expected outcomes, or whatever, I am moving forms around in my head that I am aware of, can recognize and produce, but couldn't describe with any connection to my senses.

The same when seeking a proof for a conjecture in an idiosyncratic algebra.

Am I really dealing in semantics? Or have I just learned the graph-like latent representation for (statistical or reliable) invariant relationships in a bunch of syntax?

Is there a difference?

Don't we just learn the syntax of the visual world? Learning abstractions such as density, attachment, purpose, dimensions, sizes, that are not what we actually see, which is lots of dot magnitudes of three kinds. And even those abstractions benefit greatly from the words other people use describing those concepts. Because you really don't "see" them.

I would guess that someone who was born without vision, touch, smell or taste, would still develop what we would consider a semantic understanding of the world, just by hearing. Including a non-trivial more-than-syntactic understanding of vision, touch, smell and taste.

Despite making up their own internal "qualia" for them.

Our senses are just neuron firings. The rest is hierarchies of compression and prediction based on their "syntax".

viccis 398 days ago

>Am I really dealing in semantics? Or have I just learned the graph-like latent representation for (statistical or reliable) invariant relationships in a bunch of syntax?

This and the rest of the comment are philosophical skepticism, and Kant blew this apart back when Hume's "bundle of experience" model of human subjects was considered an open problem in epistemology.

EGreg 398 days ago

Can you get into more detail and share some links? Inquiring minds want to know

globnomulous 398 days ago

> Anyone who has widely read topics across philosophy, science (physics, biology), economics, politics (policy, power), from practitioners, from original takes, news, etc. ... has managed to understand a tremendous number of relationships due to just words and their syntax.

You're making a slightly different point from the person you're answering. You're talking about the combination of words (with intelligible content, presumably) and the syntax that enables us to build larger ideas from them. The person you're answering is saying that LLM work on the principle that it's possible for intelligence to emerge (in appearance if not in fact) just by digesting a syntax and reproducing it. I agree with the person you're answering. Please excuse the length of the below, as this is something I've been thinking about a lot lately, so I'm going to do a short brain dump to get it off my chest:

The Chinese Room thought experiment --treated by the Stanford Encyclopedia of Philosophy as possibly the single most discussed and debated thought experiment of the latter half of the 20th century -- argued precisely that no understanding can emerge from syntax, and thus by extension that 'strong AI', that really, actually understands (whatever we mean by that) is impossible. So plenty of people have been debating this.

I'm not a specialist in continental philosophy or social thought, but, similarly, it's my understanding that structuralism argued essentially the one can (or must) make sense of language and culture precisely by mapping their syntax. There aren't structulists anymore, though. Their project failed, because their methods don't work.

And, again, I'm no specialist, so take this with a grain of salt, but poststructuralism was, I think, built partly on the recognition that such syntax is artificial and artifice. The content, the meaning, lives somewhere else.

The 'postmodernism' that supplanted it, in turn, tells us that the structuralists were basically Platonists or Manicheans -- treating ideas as having some ideal (in a philosophical sense) form separate from their rough, ugly, dirty, chaotic embodiments in the real world. Postmodernism, broadly speaking, says that that's nonsense (quite literally) because context is king (and it very much is).

So as far as I'm aware, plenty of well informed people whose very job is to understand these issues still debate whether syntax per se confers any understanding whatsoever, and the course philosophy followed in the 20th century seems to militate, strongly, against it.

Nevermark 398 days ago

I am using syntax in a general form to mean patterns.

We are talking about LLMs and the debate seems to be around whether learning about non-verbal concepts through verbal patterns (i.e. syntax that includes all the rules of word use, including constraints reflecting relations between words meaning, but not communication any of that meaning in more direct ways) constitutes semantic understanding or not.

In the end, all the meaning we have is constructed from the patterns our senses relay to us. We construct meaning from those patterns.

I.e. LLMs may or may not “understand” as well or deeply as we do. But what they are doing is in the same direction.

dTal 398 days ago

Curious what you make of symbolic mathematics, then - in particular, systems like Mathematica which can produce true and novel mathematical facts by pure syntactic manipulation.

The truth is, syntax and semantics are strongly intertwined and not cleanly separable. A "proof" is merely a syntactically valid string in some formal system.

mdp2021 399 days ago

Absolutely, the first task should be to understand how and why black boxes with emergent properties actually work, in order to further knowledge - but importantly, in order to improve them and build on the acquired knowledge to surpass them. That implies, curbing «parrot[ing]» and inadequate «understand[ing]».

I.e. those higher concepts are kept in mind as a goal. It is healthy: it keeps the aim alive.

visarga 398 days ago

My favorite argument against SP is zero shot translation. The model learns Japanese-English and Swahili-English and then can translate Japanese-Swahili directly. That shows something more than simple pattern matching happens inside.

Besides all arguments based on model capabilities, there is also an argument from usage - LLMs are more like pianos than parrots. People are playing the LLM on the keyboard, making them 'sing'. Pianos don't make music, but musicians with pianos do. Bender and Gebru talk about LLMs as if they work alone, with no human direction. Pianos are also dumb on their own.

Hendrikto 398 days ago

The translation happens because of token embeddings. We spent a lot of time developing rich embeddings that capture contextual semantics. Once you learn those, translation is “simply” embedding in one language, and disembedding in another.

This does not show complex thinking behavior, although there are probably better examples. Translation just isn’t really one of them.

spartanatreyu 397 days ago

Furthermore: Learning additional languages fine tunes the embedding.

EGreg 398 days ago

This is also the problem I have with John Searle’s Chinese room

nthingtohide 398 days ago

> The model learns Japanese-English and Swahili-English and then can translate Japanese-Swahili directly. That shows something more than simple pattern matching happens inside.

The "water story" is a pivotal moment in Helen Keller's life, marking the start of her communication journey. It was during this time that she learned the word "water" by having her hand placed under a running pump while her teacher, Anne Sullivan, finger-spelled the word "w-a-t-e-r" into her other hand. This experience helped Keller realize that words had meaning and could represent objects and concepts.

As the above human experience shows, aligning tokens from different modalities is the first step in doing anything useful.

agentcoops 399 days ago

1000%. It's really hard to express this to non-engineers who never wasted years of their life trying to work with n-grams and NLTK (even topic models) to make sense of textual data... Projects I dreamed of circa 2012 are now completely trivial. If you do have that comparison ready-at-hand, the problem of understanding what this mind-blowing leap means, to which end I find writing like the OP helpful, is so fascinating and something completely different than complaining that it's a "black box."

I've expressed this on here before, but it feels like the everyday reception of LLMs has been so damaged by the general public having just gotten a basic grasp on the existence of machine learning.

theahura 399 days ago

Thanks for the follow up. I've been following your circuits thread for several years now. I find the linear representation hypothesis very compelling, and I have a draft of a review for Toy Models of Superposition sitting in my notes. Circuits I find less compelling, since the analysis there feels very tied to the transformer architecture in specific, but what do I know.

Re linear representation hypothesis, surely it depends on the architecture? GANs, VAEs, CLIP, etc. seem to explicitly model manifolds. And even simple models will, due to optimization pressure, collapse similar-enough features into the same linear direction. I suppose it's hard to reconcile the manifold hypothesis with the empirical evidence that simple models will place similar-ish features in orthogonal directions, but surely that has more to do with the loss that is being optimized? In Toy Models of Superposition, you're using a MSE which effectively makes the model learn an autoencoder regression / compression task. Makes sense then that the interference patterns between co-occurring features would matter. But in a different setting, say a contrastive loss objective, I suspect you wouldn't see that same interference minimization behavior.

colah3 399 days ago

> Circuits I find less compelling, since the analysis there feels very tied to the transformer architecture in specific, but what do I know.

I don't think circuits is specific to transformers? Our work in the Transformer Circuits thread often is, but the original circuits work was done on convolutional vision models (https://distill.pub/2020/circuits/ )

> Re linear representation hypothesis, surely it depends on the architecture? GANs, VAEs, CLIP, etc. seem to explicitly model manifolds

(1) There are actually quite a few examples of seemingly linear representations in GANs, VAEs, etc (see discussion in Toy Models for examples).

(2) Linear representations aren't necessarily in tension with the manifold hypothesis.

(3) GANs/VAEs/etc modeling things as a latent gaussian space is actually way more natural if you allow superposition (which requires linear representations) since central limit theorem allows superposition to produce Gaussian-like distributions.

theahura 399 days ago

> the original circuits work was done on convolutional vision models

O neat, I haven't read that far back. Will add it to the reading list.

To flesh this out a bit, part of why I find circuits less compelling is because it seems intuitive to me that neural networks more or less smoothly blend 'process' and 'state'. As an intuition pump, a vector x matrix matmul in an MLP can be viewed as changing the basis of an input vector (ie the weights act as a process) or as a way to select specific pieces of information from a set of embedding rows (ie the weights act as state).

There are architectures that try to separate these out with varying degrees of success -- LSTMs and ResNets seem to have a more clear throughline of 'state' with various 'operations' that are applied to that state in sequence. But that seems really architecture-dependent.

I will openly admit though that I am very willing to be convinced by the circuits paradigm. I have a background in molecular bio and there's something very 'protein pathways' about it.

> Linear representations aren't necessarily in tension with the manifold hypothesis.

True! I suppose I was thinking about a 'strong' form of linear representations, which is something like: features are represented by linear combinations of neurons that display the same repulsion-geometries as observed in Toy Models, but that's not what you're saying / that's me jumping a step too far.

> GANs/VAEs/etc modeling things as a latent gaussian space is actually way more natural if you allow superposition

Superposition is one of those things that has always been so intuitive to me that I can't imagine it not being a part of neural network learning.

But I want to make sure I'm getting my terminology right -- why does superposition necessarily require the linear representation hypothesis? Or, to be more specific, does [individual neurons being used in combination with other neurons to represent more features than neurons] necessarily require [features are linear compositions of neurons]?

colah3 399 days ago

> True! I suppose I was thinking about a 'strong' form of linear representations, which is something like: features are represented by linear combinations of neurons that display the same repulsion-geometries as observed in Toy Models, but that's not what you're saying / that's me jumping a step too far.

Note this happens in "uniform superposition". In reality, we're almost certainly in very non-uniform superposition.

One key term to look for is "feature manifolds" or "multi-diemsnional features". Some discussion here: https://transformer-circuits.pub/2024/july-update/index.html...

(Note that the term "strong linear representation" is becoming a term of art in the literature referring to the idea that all features are linear, rather than just most or some.)

> I want to make sure I'm getting my terminology right -- why does superposition necessarily require the linear representation hypothesis? Or, to be more specific, does [individual neurons being used in combination with other neurons to represent more features than neurons] necessarily require [features are linear compositions of neurons]?

When you say "individual neurons being used in combination with other neurons to represent more features than neurons", that's a way one might _informally_ talk about superposition, but doesn't quite capture the technical nuance. So it's hard to know the full scope of what you intend. All kinds of crazy things are possible if you allow non-linear features, and it's not necessarily clear what a feature would mean.

Superposition, in the narrow technical sense of exploiting compressed sensing / high-dimensional spaces, requires linear representations and sparsity.

theahura 399 days ago

> One key term to look for is "feature manifolds" or "multi-diemsnional features"

I should probably read the updates more. Not enough time in the day. But yea the way you're describing feature manifolds and multidimensional features, especially the importance of linearity-in-properties and not necessarily linearity-in-dimensions, makes a lot of sense and is basically how I default think about these things.

> but doesn't quite capture the technical nuance. So it's hard to know the full scope of what you intend.

Fair, I'm only passingly familiar with compressed sensing so I'm not sure I could offer a more technical definition without, like, a much longer conversation! But it's good to know in the future that in a technical sense linear representations and superposition are dependent.

> all features are linear, rather than just most or some

Potentially a tangent, but compared to what? I suppose the natural answer is "non linear features" but has there been anything to suggest that neural networks represent concepts in this way? I'd be rather surprised if they did within a single layer. (Across layers, sure, but that actually starts to pull me more towards circuits)

rajnathani 398 days ago

I was going to comment the same about the Superposition hypothesis [0], when the OP comment (edit: Update: The OP commenter is (as pointed by other HN comments, the cofounder of Anthropic) behind the Superposition research) mentioned about "I've had a lot more success with: * The linear representation hypothesis - The idea that "concepts" (features) correspond to directions in neural networks", as this concept-per-NN-feature idea seems too "basic" to explain some of the learning which NNs can do on datasets. On one of our custom trained neural network models (not LLM, but audio-based and currently proprietary) we noticed the same of the ML model being able to "overfit" on a large amount of data despite not many few parameters relative to the size of the dataset (and that too with dropout in early layers).

[0] https://www.anthropic.com/research/superposition-memorizatio...

j2kun 398 days ago

This has mirrored my experience attempting to "apply" topology in real world circumstances, off and on since I first studied topology in 2011.

I even hesitate now at the common refrain "real world data approximates a smooth, low dimensional manifold." I want to spend some time really investigating to what extent this claim actually holds for real world data, and to what extent it is distorted by the dimensionality reduction method we apply to natural data sets in order to promote efficiency. But alas, who has the time?

riemannzeta 399 days ago

I think it's interesting that in physics, different global symmetries (topological manifolds) can satisfy the same metric structure (local geometry). For example, the same metric tensor solution to Einstein's field equation can exist on topologically distinct manifolds. Conversely, looking at solutions to the Ising Model, we can say that the same lattice topology can have many different solutions, and when the system is near a critical point, the lattice topology doesn't even matter.

It's only an analogy, but it does suggest at least that the interesting details of the dynamics aren't embedded in the topology of the system. It's more complicated than that.

colah3 399 days ago

If you like symmetry, you might enjoy how symmetry falls out of circuit analysis of conv nets here:

https://distill.pub/2020/circuits/equivariance/

riemannzeta 398 days ago

Thanks for this additional link, which really underscores for me at least how you're right about patterns in circuits being a better abstraction layer for capturing interesting patterns than topological manifolds.

I wasn't familiar with the term "equivariance" but I "woke up" to this sort of approach to understanding deep neural networks when I read this paper, which shows how restricted boltzman machines have an exact mapping to the renormalization group approach used to study phase transitions in condensed matter and high energy physics:

https://arxiv.org/abs/1410.3831

At high enough energy, everything is symmetric. As energy begins to drain from the system, eventually every symmetry is broken. All fine structure emerges from the breaking of some symmetries.

I'd love to get more in the weeds on this work. I'm in my own local equilibrium of sorts doing much more mundane stuff.

dang 399 days ago

That earlier post had a few small HN discussions (for those interested):

Neural Networks, Manifolds, and Topology (2014) - https://news.ycombinator.com/item?id=19132702 - Feb 2019 (25 comments)

Neural Networks, Manifolds, and Topology (2014) - https://news.ycombinator.com/item?id=9814114 - July 2015 (7 comments)

Neural Networks, Manifolds, and Topology - https://news.ycombinator.com/item?id=7557964 - April 2014 (29 comments)

godelski 399 days ago

Loved these posts and they inspired a lot of my research and directions during my PhDs.

For anyone interested in these may I also suggest learning about normalizing flows? (They are the broader class to flow matching) They are learnable networks that learn coordinate changes. So the connection to geometry/topology is much more obvious. Of course the down side of flows is you're stuck with a constant dimension (well... sorta) but I still think they can help you understand a lot more of what's going on because you are working in a more interpretable environment

winwang 399 days ago

hey chris, I found your posts quite inspiring back then, with very poetic ideas. cool to see you follow up here!

adamnemecek 398 days ago

Consider looking into fields related to machine learning to see how topology is used there. The main problem is that some of the cool math did not survive the transition to CS, e.g. the math for control theory is not quite present in RL.

In terms of topology, control theory has some very cool topological interpretations, e.g. toruses appear quite a bit in control theory.

iNic 399 days ago

My guess is that the linear representation hypothesis is only approximately right in the sense that my expectation is that it is more like a Lie Group. Locally flat, but the concept breaks at some point. Note that I am a mathematician who knows very little about machine learning apart from taking a few classes at uni

3abiton 398 days ago

The linear representation hypothesis is rather quite intreguing, I am curious what was the intuition behind it.

colah3 398 days ago

See https://transformer-circuits.pub/2022/toy_model/index.html#m...

If you're new to this, I'd mostly just look at all the empirical examples.

The slightly harder thing is to consider the fact that neural networks are made of linear functions with non-linearities between them, and to try to think about when linear directions will be computationally natural as a result.