Hacker News new | ask | show | jobs
by therajiv 3253 days ago
It's not clear to me how malicious actors can manipulate this observation to confuse self-driving cars. That said, I don't think this discredits the point of the article; it's important to note how easily deep learning models can be fooled if you understand the math behind them. I just think the example of tricking self-driving cars is difficult to relate with / understand.
5 comments

Why do you say that? The first demo they provide shows that the adversarial image, when printed and then manipulated, still fools the algorithm. That means that the example is robust to various affine transformations but also to the per-pixel noise that is a result of a printing something and then viewing it again through a camera.

Suppose you were to place an example like that on a stop sign that fooled a car into thinking that it was a tree. The car might blow through an intersection at speed as a result.

The training strategy they used provides a template for doing even more exotic manipulations. For example, you could train an adversarial example that looked like one thing when viewed from far away but something quite different up close. Placing an image like that by a road could result in an acute, unexpected change in the car's behavior (e.g. veering sharply to avoid a "person" that suddenly appeared).

You provide great examples, thanks. I guess I was just hoping that the article would spell out those situations as clearly as you did.
Though I generally agree with your point, the tree vs stopsign example may not be the best because it would arguably work equally well on humans.
Did the perturbed image of the cat in the article look like a desktop computer to you?

The point is that humans would see one thing whereas computers would be highly confident it is something else.

Only if the adversarial image printed doesn't look like the stop sign, though the example in this article shows that it's entirely possible to make an image that just looks like a distorted/badly-printed kitten to a human but completely different to a computer. A similar image for a stop sign might just look like wear in the paint or weird reflections or something but still look like a stop sign to a human.
yes but wont we still notice that self driving cars aren't stopping at the stop sign? and we'd investigate
You could wear special adversarial clothing for example, or even just project adversarial images onto pavement, walls, poles, road signs, and other reflective surfaces.
you could also throw nails onto a highway out your car window. I'm not sure why someone would though.
Or throw rocks from bridges into highway traffic. Teenagers occasionally do that
Why limit yourself to self driving cars? A smart malicious actor would just throw oil out their window on a highway. Watch all the cars crash!!

I think these adversarial examples are near irrelevant issues for self driving cars. If someone does something bad, we prosecute them. Its the same whether you're throwing oil onto a highway, covering up stop signs with adversarial stop signs, or whatever you might want to do.

Now if there was an exploit that caused all self driving cars in the whole country to suddenly crash into walls, that would be one thing. But these image-based attacks are limited to a single intersection or road at time. And after a single car crashes, the intersection gets closed. So if you really want to kill a few people, why not just go and stab them in the neck?

Or could make a brick wall look like a flying plastic bag!
That's another fundamental problem in itself: your car doesn't have much reasoning ability or knowledge of the world, so it can't tell if it's a flying plastic bag or a large boulder.
I doubt self driving cars would rely on a single network on a single image source (I hope not!).

Robust systems expect that some of the inferences can be mistaken (noisy). That's why you want to run multiple sensor types into different models, and use some kind of mixture of experts +/- probabilistic fusion.

It doesn't matter how many algorithms or sensors are consulted or combined to form judgment. If an attacker can obtain a self driving vehicle's hardware, and if enough tests can be performed per seconds, the attacker can train images that fool it.

Your idea is similar to an appeal to security through obscurity. Might work sometimes, but not generally.

(Noise does not help, because you can still discover a gradient to descend by averaging repeated trials.)

You assume that all sensors are working on 'images' and that the algorithm are all using gradient descent.
Replace 'images' with 'sensor data' and adversarial examples can still be generated. They might not be as easy to feed into the vehicles hardware (e.g. requiring speakers to fool an acoustic sensor), but the same principles apply.

It's also not necessary for the recognition algorithms to be using gradient descent, so long as they are differentiable (or can be approximated by a model that is), you can use gradient descent to find adversarial examples.

Adversarial examples exist for any model with a high input dimension (in relation to the available training data), differentiability only helps with finding them.

That's really interesting though, you might be able to combine something like Random Forest with Neural Nets to make them more robust to adversarial images.