|
|
|
|
|
by colah3
1259 days ago
|
|
I'm glad you've enjoyed it! If you like the idea of a periodic table of features, you might like the Early Vision article from the original Distill circuits thread: https://distill.pub/2020/circuits/early-vision/ We've had a much harder time isolating features in language models than vision models (especially early vision), so I think we have a clearer picture there. And it seems remarkably structured! My guess is that language models are just making very heavy use of superposition, which makes it much harder to tease apart the features and develop a similar picture. Although we did get a tiny bit of traction here: https://transformer-circuits.pub/2022/solu/index.html#sectio... |
|