|
|
|
|
|
by chuckbot
3290 days ago
|
|
On principle you're right, but at least for computer vision the number of layers you mention are a bit off. VGG16 worked well with 16 layers without any special handling. ResNet went to >150 layers by using shortcuts, which kind of cracked the problem already. This paper gives us more insight and maybe a more elegant solution. edit: Just realized you said 2/3 _fully connected layers_, which is right. But for convolutions we needed skip connections, too, to get them to work. Any reason you single out fully connected layers? |
|
> Both RNNs and CNNs can stabilize learning via weight sharing, therefore they are less prone to these perturbations. In contrast, FNNs trained with normalization techniques suffer from these perturbations and have high variance in the training error (see Figure 1).
Essentially FNNs stand to benefit more from this work than CNNs or RNNs.