|
|
|
|
|
by erhk
2444 days ago
|
|
>you cannot really remove pixels in an image I'm unconvinced by this statement. There are many attempts to negate attacks that do so by applying linear transformations, masks, etc. To images. Removing pixels is not novel. We like to imply that domain knowledge is relevant but after you design a feature vector it all ends up the same. |
|
The time dimension adds complexity to the problem as the optimal values for the perturbation vary depending on both the immediately surrounding values, and many of the values beforehand.
When I say “hello world”, the fact I said “e” depends on the fact I said “h”. “L” depends on both “e” and “h”... etc etc.
Adds an extra dimension to the problem.
Also, distance metrics for images aren’t ideal for audio, for many reasons. That’s why audio signal processing is a different sub field vs image processing.
The approaches are similar, but we have to use different things in the end because audio behaves differently to images. Eg feature extraction through MFCC is a variant of Fourier, but specifically tailored for the human ear.
E.g. Lea Schonherr et al.’s really good Psychoacoustic attack paper.
On the negation of attacks through transforms - important to remember that an ensemble of weak defences are not strong. Many attacks have been shown to be robust to simple transformations.