Many previous algorithms (adversarial training, distillation, most attacks, etc.) can be used in 3D in a fairly straightforward manner as they are architecture-agnostic. However, they do not make use of specific properties that are present in 3D point sets and the 3D neural networks. For example, removing points as an attack or a defense is specific to point sets; you cannot really remove pixels in an image. The distribution of points in a point cloud also gives us information that can be used in defenses, but the attacker can also tamper with it (this is partially the focus of this work).
Similarly, adversarial attacks/defenses are still being proposed for graphs, audio, and other domains because we can leverage domain-specific knowledge.
Would you have a canonical name for this distribution ? If you try matching log likelihoods, what parametric family does it resemble ? Briefly, given one of the canonical two dozen (uni/multi)variate distribution, one can create new distributions either by location-scale transform, mixtures, or say by using a k-param EFD family. So if I pick a k-param MVN ( multivariate normal with k means, k sigmas & O(k^2) correlations, I can create new distributions all day long by tweaking these 2k+k^2 params until cows come home. Brittle inference engines such as CNNs trained on a specific family with specific (hyper)parameters will fail once the distribution changes significantly, though visually the changes will be imperceptible.
Sorry, I'm not too knowledgeable of the math part of the distribution. Usually, we would want some set of points from the surface of the object that maximizes the distance between each point and its nearest neighbors. Then, the points would be distributed with uniform density across the surface.
In my previous paper, I've shown that moving the points around on the surface of an object does lead to imperceptible but effective adversarial attacks, as you've observed.
no worries. On your github you have all the point clouds, so I’ll give it a shot one of these days. If you mathematize the distribution, you get a lot more mileage for your results because you get interpretation for free. Changing moments (skew etc) will sufficiently alter the dist while being imperceptible visually.
I'm unconvinced by this statement. There are many attempts to negate attacks that do so by applying linear transformations, masks, etc. To images. Removing pixels is not novel.
We like to imply that domain knowledge is relevant but after you design a feature vector it all ends up the same.
The specific feature vector statement doesn’t hold for audio (at least).
The time dimension adds complexity to the problem as the optimal values for the perturbation vary depending on both the immediately surrounding values, and many of the values beforehand.
When I say “hello world”, the fact I said “e” depends on the fact I said “h”. “L” depends on both “e” and “h”... etc etc.
Adds an extra dimension to the problem.
Also, distance metrics for images aren’t ideal for audio, for many reasons. That’s why audio signal processing is a different sub field vs image processing.
The approaches are similar, but we have to use different things in the end because audio behaves differently to images. Eg feature extraction through MFCC is a variant of Fourier, but specifically tailored for the human ear.
E.g. Lea Schonherr et al.’s really good Psychoacoustic attack paper.
On the negation of attacks through transforms - important to remember that an ensemble of weak defences are not strong. Many attacks have been shown to be robust to simple transformations.
Yes, there are similar ideas to removing points, like masks and other transformations. Removing points is merely a 3D equivalent of the idea of destroying potentially adversarial information. I guess you can "remove" a pixel by setting it to a certain color, so my statement is not entirely accurate. However, point-removal methods are able to take into consideration the distribution of points, which is unique to 3D point sets. Furthermore, there are a lot of redundant points on the surface of an object, which means that removing a few points will not destroy the shape information.
This paper does suggest that we can circumvent certain domain-specific knowledge when attacking. This does not mean that we won't discover methods to utilize domain-specific knowledge in the future. I would imagine extending current provably robust methods to 3D would require domain-specific knowledge to deal with the distribution of points.
Similarly, adversarial attacks/defenses are still being proposed for graphs, audio, and other domains because we can leverage domain-specific knowledge.