Hacker News new | ask | show | jobs
by theahura 399 days ago
Thanks for the follow up. I've been following your circuits thread for several years now. I find the linear representation hypothesis very compelling, and I have a draft of a review for Toy Models of Superposition sitting in my notes. Circuits I find less compelling, since the analysis there feels very tied to the transformer architecture in specific, but what do I know.

Re linear representation hypothesis, surely it depends on the architecture? GANs, VAEs, CLIP, etc. seem to explicitly model manifolds. And even simple models will, due to optimization pressure, collapse similar-enough features into the same linear direction. I suppose it's hard to reconcile the manifold hypothesis with the empirical evidence that simple models will place similar-ish features in orthogonal directions, but surely that has more to do with the loss that is being optimized? In Toy Models of Superposition, you're using a MSE which effectively makes the model learn an autoencoder regression / compression task. Makes sense then that the interference patterns between co-occurring features would matter. But in a different setting, say a contrastive loss objective, I suspect you wouldn't see that same interference minimization behavior.

2 comments

> Circuits I find less compelling, since the analysis there feels very tied to the transformer architecture in specific, but what do I know.

I don't think circuits is specific to transformers? Our work in the Transformer Circuits thread often is, but the original circuits work was done on convolutional vision models (https://distill.pub/2020/circuits/ )

> Re linear representation hypothesis, surely it depends on the architecture? GANs, VAEs, CLIP, etc. seem to explicitly model manifolds

(1) There are actually quite a few examples of seemingly linear representations in GANs, VAEs, etc (see discussion in Toy Models for examples).

(2) Linear representations aren't necessarily in tension with the manifold hypothesis.

(3) GANs/VAEs/etc modeling things as a latent gaussian space is actually way more natural if you allow superposition (which requires linear representations) since central limit theorem allows superposition to produce Gaussian-like distributions.

> the original circuits work was done on convolutional vision models

O neat, I haven't read that far back. Will add it to the reading list.

To flesh this out a bit, part of why I find circuits less compelling is because it seems intuitive to me that neural networks more or less smoothly blend 'process' and 'state'. As an intuition pump, a vector x matrix matmul in an MLP can be viewed as changing the basis of an input vector (ie the weights act as a process) or as a way to select specific pieces of information from a set of embedding rows (ie the weights act as state).

There are architectures that try to separate these out with varying degrees of success -- LSTMs and ResNets seem to have a more clear throughline of 'state' with various 'operations' that are applied to that state in sequence. But that seems really architecture-dependent.

I will openly admit though that I am very willing to be convinced by the circuits paradigm. I have a background in molecular bio and there's something very 'protein pathways' about it.

> Linear representations aren't necessarily in tension with the manifold hypothesis.

True! I suppose I was thinking about a 'strong' form of linear representations, which is something like: features are represented by linear combinations of neurons that display the same repulsion-geometries as observed in Toy Models, but that's not what you're saying / that's me jumping a step too far.

> GANs/VAEs/etc modeling things as a latent gaussian space is actually way more natural if you allow superposition

Superposition is one of those things that has always been so intuitive to me that I can't imagine it not being a part of neural network learning.

But I want to make sure I'm getting my terminology right -- why does superposition necessarily require the linear representation hypothesis? Or, to be more specific, does [individual neurons being used in combination with other neurons to represent more features than neurons] necessarily require [features are linear compositions of neurons]?

> True! I suppose I was thinking about a 'strong' form of linear representations, which is something like: features are represented by linear combinations of neurons that display the same repulsion-geometries as observed in Toy Models, but that's not what you're saying / that's me jumping a step too far.

Note this happens in "uniform superposition". In reality, we're almost certainly in very non-uniform superposition.

One key term to look for is "feature manifolds" or "multi-diemsnional features". Some discussion here: https://transformer-circuits.pub/2024/july-update/index.html...

(Note that the term "strong linear representation" is becoming a term of art in the literature referring to the idea that all features are linear, rather than just most or some.)

> I want to make sure I'm getting my terminology right -- why does superposition necessarily require the linear representation hypothesis? Or, to be more specific, does [individual neurons being used in combination with other neurons to represent more features than neurons] necessarily require [features are linear compositions of neurons]?

When you say "individual neurons being used in combination with other neurons to represent more features than neurons", that's a way one might _informally_ talk about superposition, but doesn't quite capture the technical nuance. So it's hard to know the full scope of what you intend. All kinds of crazy things are possible if you allow non-linear features, and it's not necessarily clear what a feature would mean.

Superposition, in the narrow technical sense of exploiting compressed sensing / high-dimensional spaces, requires linear representations and sparsity.

> One key term to look for is "feature manifolds" or "multi-diemsnional features"

I should probably read the updates more. Not enough time in the day. But yea the way you're describing feature manifolds and multidimensional features, especially the importance of linearity-in-properties and not necessarily linearity-in-dimensions, makes a lot of sense and is basically how I default think about these things.

> but doesn't quite capture the technical nuance. So it's hard to know the full scope of what you intend.

Fair, I'm only passingly familiar with compressed sensing so I'm not sure I could offer a more technical definition without, like, a much longer conversation! But it's good to know in the future that in a technical sense linear representations and superposition are dependent.

> all features are linear, rather than just most or some

Potentially a tangent, but compared to what? I suppose the natural answer is "non linear features" but has there been anything to suggest that neural networks represent concepts in this way? I'd be rather surprised if they did within a single layer. (Across layers, sure, but that actually starts to pull me more towards circuits)

I was going to comment the same about the Superposition hypothesis [0], when the OP comment (edit: Update: The OP commenter is (as pointed by other HN comments, the cofounder of Anthropic) behind the Superposition research) mentioned about "I've had a lot more success with: * The linear representation hypothesis - The idea that "concepts" (features) correspond to directions in neural networks", as this concept-per-NN-feature idea seems too "basic" to explain some of the learning which NNs can do on datasets. On one of our custom trained neural network models (not LLM, but audio-based and currently proprietary) we noticed the same of the ML model being able to "overfit" on a large amount of data despite not many few parameters relative to the size of the dataset (and that too with dropout in early layers).

[0] https://www.anthropic.com/research/superposition-memorizatio...