|
|
|
|
|
by theahura
399 days ago
|
|
Thanks for the follow up. I've been following your circuits thread for several years now. I find the linear representation hypothesis very compelling, and I have a draft of a review for Toy Models of Superposition sitting in my notes. Circuits I find less compelling, since the analysis there feels very tied to the transformer architecture in specific, but what do I know. Re linear representation hypothesis, surely it depends on the architecture? GANs, VAEs, CLIP, etc. seem to explicitly model manifolds. And even simple models will, due to optimization pressure, collapse similar-enough features into the same linear direction. I suppose it's hard to reconcile the manifold hypothesis with the empirical evidence that simple models will place similar-ish features in orthogonal directions, but surely that has more to do with the loss that is being optimized? In Toy Models of Superposition, you're using a MSE which effectively makes the model learn an autoencoder regression / compression task. Makes sense then that the interference patterns between co-occurring features would matter. But in a different setting, say a contrastive loss objective, I suspect you wouldn't see that same interference minimization behavior. |
|
I don't think circuits is specific to transformers? Our work in the Transformer Circuits thread often is, but the original circuits work was done on convolutional vision models (https://distill.pub/2020/circuits/ )
> Re linear representation hypothesis, surely it depends on the architecture? GANs, VAEs, CLIP, etc. seem to explicitly model manifolds
(1) There are actually quite a few examples of seemingly linear representations in GANs, VAEs, etc (see discussion in Toy Models for examples).
(2) Linear representations aren't necessarily in tension with the manifold hypothesis.
(3) GANs/VAEs/etc modeling things as a latent gaussian space is actually way more natural if you allow superposition (which requires linear representations) since central limit theorem allows superposition to produce Gaussian-like distributions.