Hacker News new | ask | show | jobs
by ddingus 1975 days ago
I would not trust that. Seriously.

Reasonably effective stream watermarking happens every day and is done in the human vocal range with almost no listener impact.

In radio, Arbitron has a system working well within the lower audio range, even AM radio. AM is typically 5Khz bandwidth.

They use a spectral masking technique able to encode ID bits into streams that can be decoded with portable devices.

PPM Portable People Meter

Frankly, this kind of thing would go unnoticed by pretty much all listeners.

From the PDF I linked:

[...]all watermarking technologies use the well-known perceptual principle of “masking,” which was first reported in the early 20th century and is a core technical basis for mp3, AAC, and a host of data-rate reduction schemes.

In simple language, a loud burst of energy at one frequency will deafen the human auditory system to certain other audio components at nearby frequencies for a period of time before, during, and after the loud signal.

Consider the following illustration: A tone burst at 1.1 kHz with an intensity of 0 dB will hide (make imperceptible) an added signal at 1.11 kHz with a level of -30 dB for a period of about 10 ms before the burst and as much as 50 ms after the burst. However, modern signal-processing techniques can still detect the existence of this added 1.11 kHz component even though the ear cannot.

This is the basis of PPM and other similar watermarking technologies that use masking for determining the frequencies and intensity of the data that can be added for the station-identifying watermark.

The PPM system constructs 10 spectral channels in the region from 1.0 kHz to 3.0 kHz. The original program audio energy in each channel is evaluated for its ability to mask an added component. If that masking energy is insufficient, nothing is added. Conversely, if the energy in a channel is large enough, a tone is injected, chosen from one of four possible frequencies within the channel. For example, the channel centered at 1058 Hz might have one of the following four frequencies injected: 1046, 1054, 1062, or 1070 Hz.

Each of the four frequencies represents 2 bits of information. If we assume that this process repeats at a 500 ms rate, using all channels provides 40 bits per second or 2400 bits per minute of watermark code. Let’s further assume that a radio station is credited for a listener if any code is correctly detected within a 3-minute interval. With the very large number of encoded bits generated in 3 minutes (2400 x 3 = 7200 bits) and a station’s identification data needing perhaps only 50 bits, there is massive excess capacity for redundancy, error correction, and for audio that does not have enough high-frequency content for masking.

https://blogalytics.typepad.com/files/a-technical-look-at-ar...

2 comments

So if masking is used, I assume compressing the audio with any modern compression scheme from mp3 up should defeat that shouldn't it (because they drop masked signals to save bandwidth)?
My advice here would be to do some analysis on known watermarked audio, and or go patent, software, firmware hacking.

The first step is to ID marks successfully.

Only then can means and methods to evade the mark be trusted.

I would look really hard at what the radio industry has done. They faced very significant challenges as internet advertising rose up to dominate.

The incentives to get this right and be super robust are all there and are time tested, production proven today.

Depends. The Arbitron system works through the HD Radio codec, which is a wavelet codec. It is basically hybrid mp3 type coupled with high frequency reconstruction on the receiver side.

Interestingly, that literally means fake signals on the receiving end above 8 to 10Khz! Was as low, and may still be as low as 5khz when used for AM. I have not kept up.

I could tell early on. It has improved a lot since then.

The Arbitron system appears robust. Noise, low signal quality, etc... do not generally impact it much. The effective bitrate needed is very low.

Given a larger sample of audio, it is likely to work.

A robust watermarking system will include some sort of error correction, so the answer is that it might, it depends on how much error it introduces.

A purpose built algorithm designed to thwart watermarking however is far more likely to be successful than a compression algorithm that is designed to maintain the integrity of the audio.

> period of about 10 ms before the burst

Does the human auditory system work in "batches" or do we "forget" the other signal once the burst comes in?

The phenomenon described by the quoted comment is called "temporal masking". There is "pre-masking", where a sound is rendered in-perceivable by a sound that _follows_ it (your "forgetting" case). And there is post-masking, where a sound is in-perceivable because of a masking sound that preceded it. And yes, this is due to inherent slowness / lack of temporal resolution in the auditory system.

Temporal masking widely exploited in all kinds of lossy audio compression (MP3, AAC etc), to remove the data that cannot be perceived anyway.

We just don't resolve detail to that temporal degree. You can't really "listen between" the periods of a 100 Hz sound, so being unable to recognize a 10 ms event preceding a much louder one is expected.