| HN Mirror

I am unsure what you mean-- do you mean with different training sets but the same testing set?

It's an interesting question; maybe the reason for (some) of these adversarial vulnerabilities is due to a handful of bad training examples. You could formulate it as a search problem to see if there's particular images (or small groups of images) that are responsible for the adversarial vulnerabilities. This might then indicate that some of these perturbations are really just taking advantage of the fact that neural nets tend to "memorize" some of the data, so we're not really exploiting some deep structural feature so much as just feeding the echo of an input that the net has learned to automatically classify as, say, a computer/desk[0].

It would be a good project, but I don't have enough GPUs on hand to train scores of deep nets from scratch.

Assuming one were to bite the bullet, it might also be worth trying different data augmentation strategies. Most of the time, we try to eke out additional performance/robustness by using the same sets of transformations (translation, rotation, cropping, rescaling, etc.), but if the net is vulnerable to adversarial examples because of something in the training set, then you might just be making sure that adversarial vulnerability is present everywhere in the image and at multiple scales.

On a related note, there's an interesting paper about universal adversarial perturbations, i.e. those that can be added to any image and thereby induce a misclassification with high probability[1]. This effect holds even across different models, so the same perturbation can cause a misclassification in different architectures.

------

0. Neural nets learn by some combination of abstraction and memorization. If, for some reason, many members of a particular class are hard to generalize, then it's possible that they instead learn to identify some particular aspects of those classes (that are not usually present in other images) and have a disproportionate response when those features are present. If such features are not obvious to human visual inspection, then we get misclassifications without insight into why they were misclassified.

1. https://arxiv.org/pdf/1610.08401.pdf