Hacker News new | ask | show | jobs
by femto 3211 days ago
It's happening at the hardware level, so there is potentially limited scope to fix it in software. My guess is that when the author refers to "harmonics" they are really talking about intermodulation.

The idea is that if you want to create a frequency of "A", you can emit two powerful tones at frequencies "B" and "B+A", where the frequency B is high enough to be out of hearing range. The non-linearity of the microphone means the two tones mix together to produce a number of other frequencies, including the frequency "B+A"-"B" = "A".

Thus the conversion from ultrasonics to audible is happening in the microphone itself, before the software has a chance to distinguish the difference. The mixing process typically produces other frequencies other than "A", so there might be hope of a countermeasure if the microphone is able to pick up these other frequencies and the software is smart enough to use them to figure out that an attack is in progress. It's not a simple case of just filtering out a particular frequency and an intelligent choice of ultrasonic frequencies may leave only a single frequency in the band of the microphone.

It's the same principle that is used in ultrasonic beamforming speakers. That adds another element of stealth to the attack, in that the high frequencies can allow the sound to be beamformed and illuminate the microphone and not much else.

2 comments

> the author refers to "harmonics" they are really talking about intermodulation.

No, he's talking about harmonics. It's a different effect from intermodulation. It's true that intermodulation involves the sum and difference two or more frequencies. Harmonics, however, involves integer multiples of a single frequency.

But the impact is the same as intermodulation in that it's really a hardware issue and cannot be countered using a simple frequency filter.

Harmonics are multiples of the fundamental, so in this case they will also be ultrasonic.

Equation 2 in the paper and the subsequent paragraph shows what is going on. They use an ultrasonic carrier with modulation. The non-linearity causes the carrier to mix with the sidebands, the third-order intermodulation product being a copy of the modulation centred on 0Hz (ie. a baseband signal).

Edit: Figure 12 talks about harmonics, in the context of harmonics of the third order intermodulation product. What they are really refering to are the higher order: 5th, 7th, and so on intermodulation products, which in this case will be multiples of the third order product's frequency.

> The idea is that if you want to create a frequency of "A", you can emit two powerful tones at frequencies "B" and "B+A", where the frequency B is high enough to be out of hearing range. The non-linearity of the microphone means the two tones mix together to produce a number of other frequencies, including the frequency "B-A"-"B" = "A".

Does this work for ears, too?

If so, are the non-linearities of different people's ears similar enough that two people hearing the same A and B would get the same results, or would person to person variations in non-linearity mean they might hear different results?

Yes it does:

https://makezine.com/2008/10/08/homebrew-parametric-speak/

http://www.soundlazer.com/

It's reasonably consistent. Differences in non-linearity will result in different amplitudes for each intermodulation product, but not different frequencies. Typically these systems use the "third order" product. I gather that the non-linearity exploited is as much a property of the air as the ear.

I wonder if this has any implications for recording?

If someone is listening to a live musical instrument that is producing both audible sound and ultrasonic sound [1], is what the person perceives affected by intermodulation in the ear?

If the performance is also recorded using a technology that for all practical purposes reproduces perfectly everything in the audible range, then I can see a couple possible cases.

1. The microphone is designed to filter out ultrasonics or is sufficient linear to not have intermodulation.

In this case, the recording is what would be heard with no intermodulation. When played back, all the listener gets is the audible portion of the original sound, without any ultrasonics. Thus there is nothing to produce intermodulation in the listner's ear, and so the listener might perceive the recording as having a different timbre than the live instrument.

2. The microphone does not filter ultrasonics and is non-linear enough to have intermodulation. The audible intermodulation products will then be included in the recording.

When played back the listener will hear intermodulation products, but they will be the ones from the microphone's non-linearity, not the ear's non-linearity.

The question then is how close are microphone non-linearities to ear non-linearities. If they are similar, then the timbre of the recording should match live. If they are sufficiently different, the timbre could sound off.

It should be possible to design a system that records only audible frequencies and plays back only audible frequencies and sounds identical to live, but it may require specifically taking into account ultrasonics instead of just cutting them out like I think we currently do.

[1] A trumpet with a Harmon mute playing a quiet note has about 2% of its energy above 20 KHz. Playing a loud note drops that to about 0.5%. A cymbal crash is about 40% above 20 KHz. (Keys jangling are almost 70% above 20 KHz, which probably has something to do with why back in the early days of TV remote controls when they were ultrasonic instead of IR or RF people would report that if someone's keys jangled the channel would sometimes change). See: https://www.cco.caltech.edu/~boyk/spectra/spectra.htm

There are (marginally) commercialized ultrasonic speakers:

https://en.wikipedia.org/wiki/Sound_from_ultrasound

The air acts as the demodulator though.

Shaping the ultrasound to modulate the eardrum sounds scary.