Hacker News new | ask | show | jobs
by std_throwaway 3261 days ago
Does this effect carry over to classifiers which were trained with different training data?
1 comments

I am unsure what you mean-- do you mean with different training sets but the same testing set?

It's an interesting question; maybe the reason for (some) of these adversarial vulnerabilities is due to a handful of bad training examples. You could formulate it as a search problem to see if there's particular images (or small groups of images) that are responsible for the adversarial vulnerabilities. This might then indicate that some of these perturbations are really just taking advantage of the fact that neural nets tend to "memorize" some of the data, so we're not really exploiting some deep structural feature so much as just feeding the echo of an input that the net has learned to automatically classify as, say, a computer/desk[0].

It would be a good project, but I don't have enough GPUs on hand to train scores of deep nets from scratch.

Assuming one were to bite the bullet, it might also be worth trying different data augmentation strategies. Most of the time, we try to eke out additional performance/robustness by using the same sets of transformations (translation, rotation, cropping, rescaling, etc.), but if the net is vulnerable to adversarial examples because of something in the training set, then you might just be making sure that adversarial vulnerability is present everywhere in the image and at multiple scales.

On a related note, there's an interesting paper about universal adversarial perturbations, i.e. those that can be added to any image and thereby induce a misclassification with high probability[1]. This effect holds even across different models, so the same perturbation can cause a misclassification in different architectures.

------

0. Neural nets learn by some combination of abstraction and memorization. If, for some reason, many members of a particular class are hard to generalize, then it's possible that they instead learn to identify some particular aspects of those classes (that are not usually present in other images) and have a disproportionate response when those features are present. If such features are not obvious to human visual inspection, then we get misclassifications without insight into why they were misclassified.

1. https://arxiv.org/pdf/1610.08401.pdf

> I am unsure what you mean-- do you mean with different training sets but the same testing set?

Yes, assuming we have 10000 different training images. Divide these into 5 sets of 2000 each and train 5 networks with them. Assuming that 2000 images are plenty for this application, we will have 5 well trained networks that have similar performance for a test set.

BUT

They will work slightly differently internally and those "inverse gradient search" methods (or what they are called) might only be able to manipulate an image for one network at the time with "specifically chosen additive noise" while the other 4 are unimpressed.

That's assuming that the manipulation can't be targeted at all 5 classifiers at the same time.