|
|
|
|
|
by HighFreqAsuka
904 days ago
|
|
Empirically yes, I can consider a very deep fully-connected network, measure the gradients in each layer with and without skip connections, and compare. I can do this across multiple seeds and run a statistical test on the deltas. |
|
I can also prove in particular cases the MLP's sole purpose is to remove the noise added from the skip connection.