|
|
|
|
|
by arimorcos
3012 days ago
|
|
Author here. Indeed, our work is closely related to dropout. However, as we discuss in the paper (https://arxiv.org/abs/1803.06959), dropout doesn't really encourage the network to be robust to deletion generally; it only encourages this robustness up until the dropout fraction used in training. So, for example, if you train the network with a 50% dropout rate, dropout will encourage the network to be robust to dropping 50% of the units, but the network could completely fail once 51% of the units are deleted and the training objective would be perfectly happy. As a result, dropout doesn't change the shapes of ablation curves, but rather simply horizontally scales them such that the left edge of the curves is at the dropout fraction rather than 0. In contrast, we found that batch normalization actually pulls the curves up and to the right, rather than simply scaling them, though we only have hints as to why that is. Hope that was helpful! |
|
also the network could have weird results for dropout values around 20%-30% depending on how the robustness was 'learned'