| > the original circuits work was done on convolutional vision models O neat, I haven't read that far back. Will add it to the reading list. To flesh this out a bit, part of why I find circuits less compelling is because it seems intuitive to me that neural networks more or less smoothly blend 'process' and 'state'. As an intuition pump, a vector x matrix matmul in an MLP can be viewed as changing the basis of an input vector (ie the weights act as a process) or as a way to select specific pieces of information from a set of embedding rows (ie the weights act as state). There are architectures that try to separate these out with varying degrees of success -- LSTMs and ResNets seem to have a more clear throughline of 'state' with various 'operations' that are applied to that state in sequence. But that seems really architecture-dependent. I will openly admit though that I am very willing to be convinced by the circuits paradigm. I have a background in molecular bio and there's something very 'protein pathways' about it. > Linear representations aren't necessarily in tension with the manifold hypothesis. True! I suppose I was thinking about a 'strong' form of linear representations, which is something like: features are represented by linear combinations of neurons that display the same repulsion-geometries as observed in Toy Models, but that's not what you're saying / that's me jumping a step too far. > GANs/VAEs/etc modeling things as a latent gaussian space is actually way more natural if you allow superposition Superposition is one of those things that has always been so intuitive to me that I can't imagine it not being a part of neural network learning. But I want to make sure I'm getting my terminology right -- why does superposition necessarily require the linear representation hypothesis? Or, to be more specific, does [individual neurons being used in combination with other neurons to represent more features than neurons] necessarily require [features are linear compositions of neurons]? |
Note this happens in "uniform superposition". In reality, we're almost certainly in very non-uniform superposition.
One key term to look for is "feature manifolds" or "multi-diemsnional features". Some discussion here: https://transformer-circuits.pub/2024/july-update/index.html...
(Note that the term "strong linear representation" is becoming a term of art in the literature referring to the idea that all features are linear, rather than just most or some.)
> I want to make sure I'm getting my terminology right -- why does superposition necessarily require the linear representation hypothesis? Or, to be more specific, does [individual neurons being used in combination with other neurons to represent more features than neurons] necessarily require [features are linear compositions of neurons]?
When you say "individual neurons being used in combination with other neurons to represent more features than neurons", that's a way one might _informally_ talk about superposition, but doesn't quite capture the technical nuance. So it's hard to know the full scope of what you intend. All kinds of crazy things are possible if you allow non-linear features, and it's not necessarily clear what a feature would mean.
Superposition, in the narrow technical sense of exploiting compressed sensing / high-dimensional spaces, requires linear representations and sparsity.