Hacker News new | ask | show | jobs
by fny 454 days ago
How do you all add and subtract concepts in the rabbit poem?
1 comments

Features correspond to vectors in activation space. So you can just do vector arithmetic!

If you aren't familiar with thinking about features, you might find it helpful to look at our previous work on features in superposition:

- https://transformer-circuits.pub/2022/toy_model/index.html

- https://transformer-circuits.pub/2023/monosemantic-features/...

- https://transformer-circuits.pub/2024/scaling-monosemanticit...