| > True! I suppose I was thinking about a 'strong' form of linear representations, which is something like: features are represented by linear combinations of neurons that display the same repulsion-geometries as observed in Toy Models, but that's not what you're saying / that's me jumping a step too far. Note this happens in "uniform superposition". In reality, we're almost certainly in very non-uniform superposition. One key term to look for is "feature manifolds" or "multi-diemsnional features". Some discussion here: https://transformer-circuits.pub/2024/july-update/index.html... (Note that the term "strong linear representation" is becoming a term of art in the literature referring to the idea that all features are linear, rather than just most or some.) > I want to make sure I'm getting my terminology right -- why does superposition necessarily require the linear representation hypothesis? Or, to be more specific, does [individual neurons being used in combination with other neurons to represent more features than neurons] necessarily require [features are linear compositions of neurons]? When you say "individual neurons being used in combination with other neurons to represent more features than neurons", that's a way one might _informally_ talk about superposition, but doesn't quite capture the technical nuance. So it's hard to know the full scope of what you intend. All kinds of crazy things are possible if you allow non-linear features, and it's not necessarily clear what a feature would mean. Superposition, in the narrow technical sense of exploiting compressed sensing / high-dimensional spaces, requires linear representations and sparsity. |
I should probably read the updates more. Not enough time in the day. But yea the way you're describing feature manifolds and multidimensional features, especially the importance of linearity-in-properties and not necessarily linearity-in-dimensions, makes a lot of sense and is basically how I default think about these things.
> but doesn't quite capture the technical nuance. So it's hard to know the full scope of what you intend.
Fair, I'm only passingly familiar with compressed sensing so I'm not sure I could offer a more technical definition without, like, a much longer conversation! But it's good to know in the future that in a technical sense linear representations and superposition are dependent.
> all features are linear, rather than just most or some
Potentially a tangent, but compared to what? I suppose the natural answer is "non linear features" but has there been anything to suggest that neural networks represent concepts in this way? I'd be rather surprised if they did within a single layer. (Across layers, sure, but that actually starts to pull me more towards circuits)