| HN Mirror

Here is a paper from Bosch in that direction, it uses a second network to classify examples as adversarials: https://arxiv.org/abs/1702.04267

Using a fail safe network is hard because adversarial examples usually have a high accuracy at a false class. So using an accuracy threshold in the main network wouldn't work. Using a network as described in the paper and then a different kind of classifier might be worth trying. But it has also been shown that adversarial examples can transfer to different kind of models (don't know if random forests have been tried as well).