|
|
|
|
|
by quinnftw
3322 days ago
|
|
I wonder if one could introduce a secondary classifier which is immune (or more resistant) to this kind of attack as a fail safe. One idea that comes to mind is to back the neural net with a random forest, which I imagine would be very hard to trick with this kind of attack as a collection of independent (key) weak learners are trained on the data. To trick a random forest, you would have to trick the majority of the trees within it. |
|
Using a fail safe network is hard because adversarial examples usually have a high accuracy at a false class. So using an accuracy threshold in the main network wouldn't work. Using a network as described in the paper and then a different kind of classifier might be worth trying. But it has also been shown that adversarial examples can transfer to different kind of models (don't know if random forests have been tried as well).