| HN Mirror

For deep convnets the vanishing gradient problems can mostly be solved by using residual architectures. See: https://arxiv.org/abs/1603.05027

This is kind of related to solving the vanishing gradient issue in RNNs by using additive recurrent architectures like LSTMs and GRUs.

Alternatively it's possible to use concatenative skip connections as in DenseNets: https://arxiv.org/abs/1608.06993

Still using 1000 layers is useless in practice. State of the art image classification models are in the range 30-100 layers with residual connections and varying numbers of channels per layer depending on the depth so as to keep a tractable total number of trainable parameters. The 1000 layers nets are just interesting as a memory scalability benchmark for DL frameworks and to validate empirically the feasibility of the optimization problem but are of no practical use otherwise (as far as I know).