Hacker News new | ask | show | jobs
by rachitgupta 3211 days ago
This will be fixed with a simple software update that ignores all sounds at inaudible frequencies.
6 comments

It's happening at the hardware level, so there is potentially limited scope to fix it in software. My guess is that when the author refers to "harmonics" they are really talking about intermodulation.

The idea is that if you want to create a frequency of "A", you can emit two powerful tones at frequencies "B" and "B+A", where the frequency B is high enough to be out of hearing range. The non-linearity of the microphone means the two tones mix together to produce a number of other frequencies, including the frequency "B+A"-"B" = "A".

Thus the conversion from ultrasonics to audible is happening in the microphone itself, before the software has a chance to distinguish the difference. The mixing process typically produces other frequencies other than "A", so there might be hope of a countermeasure if the microphone is able to pick up these other frequencies and the software is smart enough to use them to figure out that an attack is in progress. It's not a simple case of just filtering out a particular frequency and an intelligent choice of ultrasonic frequencies may leave only a single frequency in the band of the microphone.

It's the same principle that is used in ultrasonic beamforming speakers. That adds another element of stealth to the attack, in that the high frequencies can allow the sound to be beamformed and illuminate the microphone and not much else.

> the author refers to "harmonics" they are really talking about intermodulation.

No, he's talking about harmonics. It's a different effect from intermodulation. It's true that intermodulation involves the sum and difference two or more frequencies. Harmonics, however, involves integer multiples of a single frequency.

But the impact is the same as intermodulation in that it's really a hardware issue and cannot be countered using a simple frequency filter.

Harmonics are multiples of the fundamental, so in this case they will also be ultrasonic.

Equation 2 in the paper and the subsequent paragraph shows what is going on. They use an ultrasonic carrier with modulation. The non-linearity causes the carrier to mix with the sidebands, the third-order intermodulation product being a copy of the modulation centred on 0Hz (ie. a baseband signal).

Edit: Figure 12 talks about harmonics, in the context of harmonics of the third order intermodulation product. What they are really refering to are the higher order: 5th, 7th, and so on intermodulation products, which in this case will be multiples of the third order product's frequency.

> The idea is that if you want to create a frequency of "A", you can emit two powerful tones at frequencies "B" and "B+A", where the frequency B is high enough to be out of hearing range. The non-linearity of the microphone means the two tones mix together to produce a number of other frequencies, including the frequency "B-A"-"B" = "A".

Does this work for ears, too?

If so, are the non-linearities of different people's ears similar enough that two people hearing the same A and B would get the same results, or would person to person variations in non-linearity mean they might hear different results?

Yes it does:

https://makezine.com/2008/10/08/homebrew-parametric-speak/

http://www.soundlazer.com/

It's reasonably consistent. Differences in non-linearity will result in different amplitudes for each intermodulation product, but not different frequencies. Typically these systems use the "third order" product. I gather that the non-linearity exploited is as much a property of the air as the ear.

I wonder if this has any implications for recording?

If someone is listening to a live musical instrument that is producing both audible sound and ultrasonic sound [1], is what the person perceives affected by intermodulation in the ear?

If the performance is also recorded using a technology that for all practical purposes reproduces perfectly everything in the audible range, then I can see a couple possible cases.

1. The microphone is designed to filter out ultrasonics or is sufficient linear to not have intermodulation.

In this case, the recording is what would be heard with no intermodulation. When played back, all the listener gets is the audible portion of the original sound, without any ultrasonics. Thus there is nothing to produce intermodulation in the listner's ear, and so the listener might perceive the recording as having a different timbre than the live instrument.

2. The microphone does not filter ultrasonics and is non-linear enough to have intermodulation. The audible intermodulation products will then be included in the recording.

When played back the listener will hear intermodulation products, but they will be the ones from the microphone's non-linearity, not the ear's non-linearity.

The question then is how close are microphone non-linearities to ear non-linearities. If they are similar, then the timbre of the recording should match live. If they are sufficiently different, the timbre could sound off.

It should be possible to design a system that records only audible frequencies and plays back only audible frequencies and sounds identical to live, but it may require specifically taking into account ultrasonics instead of just cutting them out like I think we currently do.

[1] A trumpet with a Harmon mute playing a quiet note has about 2% of its energy above 20 KHz. Playing a loud note drops that to about 0.5%. A cymbal crash is about 40% above 20 KHz. (Keys jangling are almost 70% above 20 KHz, which probably has something to do with why back in the early days of TV remote controls when they were ultrasonic instead of IR or RF people would report that if someone's keys jangled the channel would sometimes change). See: https://www.cco.caltech.edu/~boyk/spectra/spectra.htm

There are (marginally) commercialized ultrasonic speakers:

https://en.wikipedia.org/wiki/Sound_from_ultrasound

The air acts as the demodulator though.

Shaping the ultrasound to modulate the eardrum sounds scary.

If I understand this correctly, the phone cannot tell if the audio is outside of human hearing range. The point of the LPF is to filter out all audio that is outside of that range.

The attack they are using is transmitting the audio at a high frequency, that when detected by the microphone generates harmonics that are within the normal range and can pass through the filter. By the time the audio signal gets to the processor, it is within audible range.

> This is MUCH bigger deal than most understand.

>> This will be fixed with a simple software update that ignores all sounds at inaudible frequencies.

In the most helpful and constructive way I can possibly say this directly: the group of individuals not understanding may include you, rachitgupta.

According to my very limited understanding, the attack occurs in hardware prior to digitization. See yesterday's discussion for more details.

If you were to read the article you'd see this isn't how it works. They're inducing harmonic signals within the speech passband.
Except the inaudible sounds are used for marketing purposes. Most companies aren't going to want to just close that door.
I'm genuinely curious - can you share a link how it works please? I've never heard of it.
Watch the 33c3 presentation linked elsewhere in this discussion, or just skip to solutions/Q&A: https://youtu.be/WW1-xnTIDjQ?t=35m05s

You can also read the paper: https://petsymposium.org/2017/papers/issue2/paper18-2017-2-s...

Here is the blurb from their talk:

Cross-device tracking (XDT) technologies are currently the "Holy Grail" for marketers because they allow to track the user's visited content across different devices to then push relevant, more targeted content. For example, if a user clicks on a particular advertisement while browsing the web at home, the advertisers are very interested in collecting this information to display, later on, related advertisements on other devices belonging to the same user (e.g., phone, tablet).

Currently, the most recent innovation in this area is ultrasonic cross-device tracking (uXDT), which is the use of the ultrasonic spectrum as a communication channel to "pair" devices for the aforementioned tracking purposes. Technically, this pairing happens through a receiver application installed on the phone or tablet. The business model is that users will receive rewards or useful services for keeping those apps active, pretty much like it happens for proximity-marketing apps (e.g., Shopkick), where users receive deals for walk-ins recorded by their indoor-localizing apps.

-- https://www.blackhat.com/eu-16/briefings.html#talking-behind...

Not sure if this will help, but I was surprised by this: https://arstechnica.com/tech-policy/2015/11/beware-of-ads-th...

SilverPush has since stopped, it seems, but that might just mean others are doing it more profitably that they were...

I know of traditional media ad-tech that uses ultrasound markers embedded in ads to track/verify if the ads were really broadcast as promised (number of times & in the correct time slots.

So the steps are:

1. Inject ultrasound markers into ad during post-production.

2. Have a server with multiple tuner cards to monitor multiple stations, grab the audio.

3. Filter audio on specific ultrasound frequencies, search for the pre-injected patterns.

4. Generate reports.

5. Get paid.

The post specifically addresses why it's not that simple - they're taking advantage of harmonics to generate signal in the audible frequencies on the microphone itself.