|
|
|
|
|
by psyklic
896 days ago
|
|
Empirical studies are only useful until the system is mathematically understood. For example, I can construct transformer circuits where the skip connection (provably) purely adds noise. I can also prove in particular cases the MLP's sole purpose is to remove the noise added from the skip connection. |
|