| HN Mirror

that's the idea! we know about adversarial inputs at inference time, this paper talks about adversarial perturbation of the model itself during training. what about undetectable adversarial training inputs where people do their own training but the model still ends up with hard to find (except for the adversary) weaknesses?