Hacker News new | ask | show | jobs
by dantle 2423 days ago
I'm a bit surprised that this works. It appears that this attack targets a single microphone. However, internal to most of these home assistant devices (e.g. Echo or HomePod) there is an array of microphones. A real sound from a spoken word would probably show up on more than one microphone (with an appropriate time and phase offset), although it seems to not be currently required. It seems like it would be complex or impossible for an attacker to target more than one microphone with this attack.
2 comments

This is covered in the paper. They acknowledge this defense technique while also pointing out that a laser flashlight could be used to illuminate all microphones at once.
The power output would need to be much higher and I'm not sure it'd work as well through glass, I'm thinking something more like multiple lasers?
Perhaps the "all at once" attack could work against today's hardware. This is because the mics (in devices I know of) are co-planar and the user may be speaking to Alexa (or whoever) from directly above or below the device. In this configuration, it is valid for all mics to be receiving the same audio signal simultaneously.

But in some future rev, one could imagine that if the mics in the array are non-coplanar (e.g. at least 4 mics) and sufficiently far from each other, then there is no possible way for the audio signal to reach them at once (unless it is actually light being measured).

You could add timing difference to individual lasers as necessary. It's not really more complicated than duplicating the laser setups and feeding them the same signal with time delays. Not a huge step.

However, non-coplanar mics would work for the opposite reason: If they are on different sides of the device, you couldn't reach all of them from the same distant location. So unless all mics receive (more or less) similar sound signals, you could discard it as manipulation.

What about internal microphones?
You could require a signal from >1 microphone, but then that would make the system less reliable if any of the mics were occluded. And this would be to prevent an attack which is kind of ridiculous in the first place.