Hacker News new | ask | show | jobs
by SubiculumCode 2775 days ago
I am not a machine learning expert, but could not these adversarial example issues be resolved by solving an image classification problem by (1)producing multiple non-equivalent classification solutions with adequate accuracy, then (2)fusion (e.g. voting) to produce a consensus classification? (3) Maybe random shuffling of which X of Z solutions get to vote in each classification attempt.

What might fool one solution might not fool another, and adversarial examples seem to depend on idiosyncrasies of a particular solution.

3 comments

When we do that (and we usually do it for accuracy improvements, not resistance to atrack, that happens at the same time for the same reasons - adversarial examples are a misclassification problem), we change the exact attack that breaks the system, but there is no 100% accurate system - and since it's completely foreign, the examples that a machine would misclassify would likely not confuse a human.

The issue is classification currently relies on a very small embedding of the data which is pattern-matched, with no semantics. It has no way of telling that the difference between a dog and an elephant ISN'T that noise gradient, at least some of the time!

Some of them yeah. There is active research on this. But it is also possible to create adversarial images for soft voting ensembles of the 6 most popular architectures. Those strong adversarial images that beat the consensus, also have a large chance to fool new neural network architectures that the adversarial image creator never had access to.
Or just adding a small amount of random noise to the input, which would wipe out the carefully constructed attack.
You can try out this technique at https://github.com/google/unrestricted-adversarial-examples My guess is it would have the same result as adding noise to the normal images too (resulting in a slightly worse performance overall).