| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bluecoconut 699 days ago

Nice~ Glad to see this published / confirmed by others. Next I hope to see some of this symmetry used to improve MoE / dynamic compute / adaptive style models!

Context: I found the same structure: early - middle - end layers serving different purposes, including the permutability of the middle layers, a year or so ago, but never got to testing more models rigerously or publishing it.

We talked about it a bit in a hackernews thread a few months ago. (https://news.ycombinator.com/item?id=39504780#39505523)

> One interesting finding though (now that I'm rambling and just typing a lot) is that in a static model, you can "shuffle" the layers (eg. swap layer 4's weights with layer 7's weights) and the resulting tokens roughly seem similar (likely caused by the ResNet style backbone). Only the first ~3 layers and last ~3 layers seem "important to not permute". It kinda makes me interpret models as using the first few layers to get into some "universal" embedding space, operating in that space "without ordering in layer-order", and then "projecting back" to token space at the end. (rather than staying in token space the whole way through).

2 comments

bigyikes 699 days ago

Do you have any indication whether this “universal” space might be shared between models, or is it unique to the architecture and training set?

Maybe it’s crazy, but is there any possibility that, say, Llama and Mistral use the same representation space?

link

chbint 699 days ago

No need to postulate platonic forms. All we need is the idea that there are real patterns to be mapped. The idea that distinct nets can share a representational space is around at least since Laakso And Cottrell published their "Content and cluster analysis: assessing representational similarity in neural systems" in 2000. If you look for "representational similarity analysis" you'll find more research about it.

link

dr_dshiv 699 days ago

> the same representation space

Platonic world of forms, perchance?

link

sjpalmer1994 699 days ago

Very interesting, I wonder if the middle layers could be shuffled during training to enforce this permutation symmetry

link