|
|
|
|
|
by radq
1089 days ago
|
|
We were missing two architecture patterns that were needed to get deeper nets to converge: residual nets [1] which solved gradient propagation, and batch normalization [2] which solved initialization. [1] Residual nets (2015): https://arxiv.org/abs/1512.03385 [2] Batch normalization (2015): https://arxiv.org/abs/1502.03167 |
|