Hacker News new | ask | show | jobs
by chuckbot 3290 days ago
On principle you're right, but at least for computer vision the number of layers you mention are a bit off. VGG16 worked well with 16 layers without any special handling. ResNet went to >150 layers by using shortcuts, which kind of cracked the problem already. This paper gives us more insight and maybe a more elegant solution.

edit: Just realized you said 2/3 _fully connected layers_, which is right. But for convolutions we needed skip connections, too, to get them to work. Any reason you single out fully connected layers?

2 comments

Regarding your edit, the authors of the paper in question focus on FNNs and note the reason in the paper:

> Both RNNs and CNNs can stabilize learning via weight sharing, therefore they are less prone to these perturbations. In contrast, FNNs trained with normalization techniques suffer from these perturbations and have high variance in the training error (see Figure 1).

Essentially FNNs stand to benefit more from this work than CNNs or RNNs.

That is the point of the paper: making deep fully-connected networks work.