Hacker News new | ask | show | jobs
by mmglr 1973 days ago
Some articles found by googling [1] [2] from two years ago describe this capability as "ultrasonic watermark" so it is not new. I think this is coming to light as Zoom has become popular with the pandemic. For a journalist wanting to sanitize audio I would think they need to remove anything higher than 15kHz.

[1] https://www.nojitter.com/video-collaboration-av/zoom-takes-v...

[2] https://venturebeat.com/2019/01/22/zoom-is-bringing-ultrason...

6 comments

Audio watermarking is old hat, and it’s FAR easier for Zoom than for say a music service, because people are used to imperfections/stuttering/blurring in their Zoom calls which can just be encoded watermarks.
A former colleague did an analysis of UMG watermarks its tracks on Spotify: https://www.mattmontag.com/music/universals-audible-watermar...
Pasting a comment I found intersting & funny from one of the commenters of that article:

"...It is a strange thing that the real quality audio is now reserved for the pirates. This industry really knows how to hit a target."

Listening to the samples (I got a nice BeO over-ears headset that has very good performance), I also realized that Spotify gives me some noise, I also thought it's a codec/digital thing.. little do I know..

I'm a Spotify subscriber but I'd be the first to admit that Spotify's audio quality isn't great to begin with, even when set to high quality streaming. It's noticeably worse than uncompressed CD quality (ignoring CDs that were mastered from sources that were lossily compressed to begin with - what a great trend that was).

This isn't a complaint, more an observation: Spotify works really well if you're outside, in a car, or even in an office environment with plenty of low level background noise. It's not so great when you run it through a half-decent hifi in your own home. Still, good enough for casual listening. However, if you're paying attention, you'll notice the flaws easily.

So, some of that noise is probably just that: noise. But some of it will also no doubt be the watermark.

> However, if you're paying attention, you'll notice the flaws easily.

At least you can tell yourself that your setup was worth the money because you can hear the difference ;)

[comment meant to be mostly tongue in cheek - I agree with your comment]

Haha - yeah, fair.

Fortunately I didn't pay that much for my setup. Amp and speakers are about 30 years old and were given to me by my stepdad about 20 years ago. Pretty much everything else is second hand from eBay and 25 - 40 years old (CD player, tuner, tape, EQ).

The biggest expense is the subwoofer, which I did buy new because used prices for a decent subwoofer are still pretty high, especially when you factor in the cost of petrol to go and collect the thing (most people don't want to post because they weigh a lot).

The only other new components are an inexpensive Bluetooth 5.0 receiver, the speaker cable (Bassface, which I want to say was about £2/metre - super-cheap by audiophile standard) and gold-plated banana plugs from RS components. All the interconnects are I think Amazon Basics.

So my total expenditure for the whole system is less than £1,000. Fully half of that is the subwoofer. Admittedly, that's still probably a fair bit by most peoples' standards, especially when it's perfectly possible to get very good sound from a hifi separates system for £250 or so (see Techmoan's video series on the topic, for example: https://www.youtube.com/watch?v=lSY1iZqH118), but it's chickenfeed for most audiophiles. Still, I'm definitely not one of those guys: it sounds more than good enough to me and I've no desire to fall any further into that particular black hole.

Except for one thing... I don't have a turntable. So what I'm probably going to do is buy a pair of SL1210s and a mixer to plug in to the system. I'm lucky enough to have a fair number of 12" singles from a freecycle "barn find" type situation a few years ago, and another time-consuming hobby to get through the rest of this pandemic will be no bad thing.

There is both a danger and a satisfaction to mostly cobbling together a nice sounding system from lots of second-hand parts though. The temptation for me is to do the same again with one or two of the other rooms in the house.

> BeO

Is that a brand or are they actually putting beryllium oxide in headphones these days?

At first I thought they meant Bang & Olufsen, a high-end brand that prefixes all their products with Beo [0]. But I guess the industry is making Beryllium drivers now [1].

[0] https://www.bang-olufsen.com/en/headphones/filter/over-ear

[1] https://blog.masterdynamic.com/article/know-your-sound-tool-...

Both your comments rock!!! Yes it's Bang & Olufsen :)
Nice! But article is 7 years old, wonder how is it now.
This is great! Thanks for linking it.

Looks to me like the Arbitron system can work with the Universal one... hmmm.

More than one mark at a time should be on the radar.

But then for the same reason, it's also easier to strip out.
This is very poor opsec advice. Robust audio watermarking is standard technology for many years now, and can be licensed from multiple vendors. If Zoom (or any other actor) cares enough to watermark their audio, you must assume that it may be hard to detect and remove.
A vocoder would probably do a good job, considering it'd put the audio into a speech basis.
A vo-coder is probably the best off-the-shelf technology. Of course a challenge with making invasive changes to the audio (in order to defeat watermarking), is that people may claim that the audio is fake/misrepresented. Vocoded audio will not sound like the original speakers, and may have artifacts. Lipsync may also be slightly off. So one would have to be careful to communicate these limitations. Which the general public may not have much interest in understanding... Adversarial opponents may latch on to these things and use it to discredit the recordings.

An more conservative approach would be to transcribe the audio into text, and only offer the audio to (more) trusted parties for verification.

Agreed. My comment was poorly worded as I did not intend to give advice.
I would not trust that. Seriously.

Reasonably effective stream watermarking happens every day and is done in the human vocal range with almost no listener impact.

In radio, Arbitron has a system working well within the lower audio range, even AM radio. AM is typically 5Khz bandwidth.

They use a spectral masking technique able to encode ID bits into streams that can be decoded with portable devices.

PPM Portable People Meter

Frankly, this kind of thing would go unnoticed by pretty much all listeners.

From the PDF I linked:

[...]all watermarking technologies use the well-known perceptual principle of “masking,” which was first reported in the early 20th century and is a core technical basis for mp3, AAC, and a host of data-rate reduction schemes.

In simple language, a loud burst of energy at one frequency will deafen the human auditory system to certain other audio components at nearby frequencies for a period of time before, during, and after the loud signal.

Consider the following illustration: A tone burst at 1.1 kHz with an intensity of 0 dB will hide (make imperceptible) an added signal at 1.11 kHz with a level of -30 dB for a period of about 10 ms before the burst and as much as 50 ms after the burst. However, modern signal-processing techniques can still detect the existence of this added 1.11 kHz component even though the ear cannot.

This is the basis of PPM and other similar watermarking technologies that use masking for determining the frequencies and intensity of the data that can be added for the station-identifying watermark.

The PPM system constructs 10 spectral channels in the region from 1.0 kHz to 3.0 kHz. The original program audio energy in each channel is evaluated for its ability to mask an added component. If that masking energy is insufficient, nothing is added. Conversely, if the energy in a channel is large enough, a tone is injected, chosen from one of four possible frequencies within the channel. For example, the channel centered at 1058 Hz might have one of the following four frequencies injected: 1046, 1054, 1062, or 1070 Hz.

Each of the four frequencies represents 2 bits of information. If we assume that this process repeats at a 500 ms rate, using all channels provides 40 bits per second or 2400 bits per minute of watermark code. Let’s further assume that a radio station is credited for a listener if any code is correctly detected within a 3-minute interval. With the very large number of encoded bits generated in 3 minutes (2400 x 3 = 7200 bits) and a station’s identification data needing perhaps only 50 bits, there is massive excess capacity for redundancy, error correction, and for audio that does not have enough high-frequency content for masking.

https://blogalytics.typepad.com/files/a-technical-look-at-ar...

So if masking is used, I assume compressing the audio with any modern compression scheme from mp3 up should defeat that shouldn't it (because they drop masked signals to save bandwidth)?
My advice here would be to do some analysis on known watermarked audio, and or go patent, software, firmware hacking.

The first step is to ID marks successfully.

Only then can means and methods to evade the mark be trusted.

I would look really hard at what the radio industry has done. They faced very significant challenges as internet advertising rose up to dominate.

The incentives to get this right and be super robust are all there and are time tested, production proven today.

Depends. The Arbitron system works through the HD Radio codec, which is a wavelet codec. It is basically hybrid mp3 type coupled with high frequency reconstruction on the receiver side.

Interestingly, that literally means fake signals on the receiving end above 8 to 10Khz! Was as low, and may still be as low as 5khz when used for AM. I have not kept up.

I could tell early on. It has improved a lot since then.

The Arbitron system appears robust. Noise, low signal quality, etc... do not generally impact it much. The effective bitrate needed is very low.

Given a larger sample of audio, it is likely to work.

A robust watermarking system will include some sort of error correction, so the answer is that it might, it depends on how much error it introduces.

A purpose built algorithm designed to thwart watermarking however is far more likely to be successful than a compression algorithm that is designed to maintain the integrity of the audio.

> period of about 10 ms before the burst

Does the human auditory system work in "batches" or do we "forget" the other signal once the burst comes in?

The phenomenon described by the quoted comment is called "temporal masking". There is "pre-masking", where a sound is rendered in-perceivable by a sound that _follows_ it (your "forgetting" case). And there is post-masking, where a sound is in-perceivable because of a masking sound that preceded it. And yes, this is due to inherent slowness / lack of temporal resolution in the auditory system.

Temporal masking widely exploited in all kinds of lossy audio compression (MP3, AAC etc), to remove the data that cannot be perceived anyway.

We just don't resolve detail to that temporal degree. You can't really "listen between" the periods of a 100 Hz sound, so being unable to recognize a 10 ms event preceding a much louder one is expected.
"Ultrasonic" does not always mean high frequency.

Could mean extreme subtlety too.

Literal meaning here is, "beyond hearing"

Have you ever seen a real world example of 'ultrasound' used to mean anything other than higher-than-audible frequencies?

Wikipedia's definition has it explicitly higher. https://en.wikipedia.org/wiki/Ultrasound

That page says:

> Ultrasound is defined by the American National Standards Institute as "sound at frequencies greater than 20 kHz".

'Ultraviolet' means only higher-than-violet.

We have another word 'infrasound' for lower-than-audible frequencies.

I'm a native English speaker technically interested in sound for 40 years and I have never heard the 'subtlety' usage.

This is an entirely fair comment. And it's typical of my experience as well, and I have a fair amount related to audio, though not as extensive as yours.

My mind works differently when it comes to language and the scope of possible meanings is something I always consider relevant.

What concerned me here was someone taking the colloquial definition of "ultrasound" literally, and making assumptions that are not valid in this context at all.

What the word actually conveys is both a matter of subtlety and frequency.

Turns out, having read the entire discussion, both are relevant in terms of threat assessment, and thinking about what is said more deeply can have a positive impact on a discussion of this nature.

All of which is why I chose to point out what "ultrasound" actually does mean linguistically.

Edit: In my experience, such uses can and do happen. I personally allow for it and use context to parse. Where there is ambiguity, I generally won't dismiss it out of hand.

Subsonic comes to mind here. As does the question why the word did not appear regarding these watermarks.

The answer may just be someone with far less domain expertise attempting to communicate.

It means "above hearing". Beyond (through, across) is "trans". I would say low, both volume and frequency to be sub (like in subtlety).
No, it means "beyond." Like you point out, "across" means something else. The Latin for "above" is supra or super.

> ultra-, prefix:

> 1. Signifying ‘lying spatially beyond or on the other side of’

> 2a. With adjectives, signifying ‘going beyond, surpassing, or transcending the limits of’ (the specified concept).

> Etymology: Latin ultrā beyond, employed as a prefix in the post-classical ultrāmundānus ultramundane, and the later ultrāmarīnus ultramarine, and ultrāmontānus ultramontane.

- Oxford English Dictionary Online

Your quick trip through the etymology triggered an opsec thought:

It may be worth a suggestive talk to expand how people take words.

A pop culture reference would be Daniel Jackson from the series, "SG-1"

We may often be constrained in our ability to understand and assess by our own preconceptions relating to language.

"Ultrasonic" was interpreted very differently by any number of us having this watermarking discussion. How often do we make assumptions about the possible field of play based on language basics?

How often do those fail to be sufficiently inclusive?

I bet it happens more than we realize.

Seems like a good basis for a DEFCON talk. "Where is Daniel Jackson when your team needs him?"

I was focused on the etymology; the actual usage of "ultrasonic" is generally confined to high pitches, not low.

Worthwhile point still, though I wouldn't have responded had the commenter not stated a specific incorrect definition. How does this connect to OPSEC though?

Indeed you were.

The relation goes right to threat and solution scopes. In this case, someone working from an incomplete definition may well also work within an incomplete set of greater assumptions.

There is what it could mean, what we take it to mean, and what it does mean.

Where those overlap or not could have a significant impact on behavior.

>> Literal meaning here is, "beyond hearing"

> It means "above hearing". Beyond (through, across) is "trans".

Where'd you get that from? Ultra is indeed "beyond"; "above" would be super.

And the Latin root for "hearing" is audi- (audience / audible / auditorium / etc.); son- is "sound".

If watermarking is ultrasonic shouldn't a simple low-pass filter defeat it?
I wonder if this is another marketing gimmick similar to end to end encryption controversy they got into. I hope by ultrasonic they just mean beyond hearing and not really that watermark lives exclusively in ultrasonic frequency range.

Do they also talk about the process for identifying the participant who leaked the content based on the leaked recording? Do they need to retain the original copy of the recording to be able to extract the watermark?

"Ultrasonic watermark" reminds me of this[1] great blogpost. It's not the same thing but based on the same concept.

[1] https://blog.benjojo.co.uk/post/encoding-data-into-dubstep-d...

This was on the front page of HN earlier this week, wasn't it? It's beginning to feel like an echo chamber in here sometimes...