Y
Hacker News
new
|
ask
|
show
|
jobs
by
bilsbie
759 days ago
How are they handling attention in their approach?
That’s going to completely change what features are looked at.
1 comments
tel
758 days ago
They target the residual stream. Also they may have a definition of “feature” that’s more general than what you’re using. Consider reading their superposition work.
link