| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tMcGrath 3194 days ago

I agree - it's a surprising and cool paper. There has been some work done on fooling network ensembles by constructing constructing a Bayesian posterior over weights using dropout [0]. This is an ensemble of weights for the same network, not over different architectures, however.

The basic idea here is that most of the time, each member of the ensemble will misclassify the adversarial example in a different way. This means that the posterior predictive distribution for adversarial examples ends up much broader, and you can detect them this way.

Surprisingly, even this can be beaten in the white-box case [1], although it's by far the hardest to beat of the current adversarial defences, and needs much more distortion. It's beaten exactly as the GP says, by differentiating through the averaging. AFAIK no-one's tried architecture ensembling, but I expect it would be vulnerable to the same technique.

[0] https://arxiv.org/abs/1703.00410 [1] https://arxiv.org/abs/1705.07263