|
|
|
|
|
by zo7
3527 days ago
|
|
My intuition is that these patterns "hijack" the ReLU activations in the lower levels, causing either important features to not fire or features that shouldn't fire to do so. Usually the lower layers learn very primitive shapes like lines and curves, and I think (although I'd need to double check) that they usually pass through entire color channels rather than nuanced mixings of colors. (So one features would either pass through all of red or all of blue or all of both, rather than pass just 66% red, 47% blue, and 33% green -- if it did the latter it wouldn't be able to generalize well) This propagates the error through the network, where the later activations start firing in the wrong places, causing the mis-classification. (This is totally unsubstantiated though) |
|
> The surprising existence of universal perturbations reveals important geometric correlations among the high-dimensional decision boundary of classifiers. It further outlines potential security breaches with the existence of single directions in the input space that adversaries can possibly exploit to break a classifier on most natural images.
The paper unpacks that explanation pretty well along with actual pictures and how they are related to the classification boundary.