Hacker News new | ask | show | jobs
by bilsbie 759 days ago
How are they handling attention in their approach?

That’s going to completely change what features are looked at.

1 comments

They target the residual stream. Also they may have a definition of “feature” that’s more general than what you’re using. Consider reading their superposition work.