| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by goldemerald 437 days ago
	This is an interesting line of research but missing a key aspect: there's (almost) no references to the linear representation hypothesis. Much work on neural network interpretability lately has shown individual neurons are polysemantic, and therefore practically useless for explainability. My hypothesis is fitting linear probes (or a sparse autoencoder) would reveal linearly semantic attributes. It is unfortunate because they briefly mention Neel Nanda's Othello experiments, but not the wide array of experiments like the NeurIPS Oral "Linear Representation Hypothesis in Language Models" or even golden gate Claude.

2 comments

akarshkumar0101 437 days ago

We mention this issue exactly in the fourth paragraph in Section 4 and in Appendix F!

link

goldemerald 437 days ago

That is addressing the incomprehensibility of PCA and applying a transformation to the entire latent space. I've never found PCA to be meaningful for deep learning. As far as I can tell, polysemous issue with neurons cannot be addressed with a single linear transformation. There is no sparse analysis (via linear probes or SAEs) and hence the unaddressed issue.

link

ipunchghosts 437 days ago

Is what your saying imply that there is a rotation matrix you can apply to each activation output to make it less entangled?

link

goldemerald 437 days ago

Not quite. For an underlying semantic concept (e.g., smiling face), you can go from a basis vector [0,1,0,...,0] to the original latent space via a single rotation. You could then induce said concept by manipulating the original latent point by traversing along that linear direction.

link

ipunchghosts 437 days ago

I think we are saying the same thing. Please correct me though where I am wrong. You could look at the maps in some way but instead of the basis being one hot dimensions (the standard basis), it could be rotated.

link

akarshkumar0101 437 days ago

We mention this issue exactly in the fourth paragraph in Section 4 and in Appendix F!

link