|
|
|
|
|
by adamnemecek
1231 days ago
|
|
Match is a bad word, the don’t match, they are duals. The residual stream aka identity mapping needs to be the identity of the attention mechanism as the attention mechanism learns. But this is the same for all residual streams, not just those in transformers. Join my discord to discuss this further https://discord.gg/mr9TAhpyBW |
|
Edit: ok, Discord it is.