Hacker News new | ask | show | jobs
by colah3 441 days ago
Features correspond to vectors in activation space. So you can just do vector arithmetic!

If you aren't familiar with thinking about features, you might find it helpful to look at our previous work on features in superposition:

- https://transformer-circuits.pub/2022/toy_model/index.html

- https://transformer-circuits.pub/2023/monosemantic-features/...

- https://transformer-circuits.pub/2024/scaling-monosemanticit...