Hacker News new | ask | show | jobs
Why xHE-AAC is being embraced at Meta (engineering.fb.com)
61 points by devtailz 1167 days ago
10 comments

This codec isn’t supported on the web. So it is Android and iOS only. This post talks about the improved audio quality, but I don’t buy it. This was probably done to reduce storage and bandwidth, not to improve audio quality. Phone speakers aren’t good enough to notice.

Also, I wish they had backed opus…

Reduced bandwidth obviously leads to better quality at the same target bit rate. Phone speakers are good enough to reproduce some obvious compression artifacts of, say, 32k/64k HE-AAC, let alone higher end headphones. ABR could definitely help.
>>Reduced bandwidth obviously leads to better quality at the same target bit rate

or, more likely, lower bandwidth at similar quality.

> Phone speakers

Hmm... You could detect the output type (built-in speakers, headphones, bluetooth) and change the audio quality accordingly.

The phone speaker mention is a response to gp’s claim that “phone speakers aren’t good enough to notice”, which is not true. We’re not talking about 128k aac vs lossless here.
Their web version seems to be the fallback (h264 etc) that is guaranteed to run on every device from the last 20 years pretty much. Then again, they clearly don't care about anything outside of their apps, the web experience (instagram especially) is a complete disaster of a walled garden with highly gimped functionality and extreme limitations everywhere.

The only good customer is one that willingly opts into all ads I guess.

> The only good customer is one that willingly opts into all ads I guess.

There's indeed some truth to that as well in the sense that the customer is the one who wants the ad to be displayed in the first place and pays for it. So it's a form of opt in as well.

That being said, the user who sees the ads is not the customer but the product. And yes, the best sellable product (i.e. user) for Meta (Facebook, Instagram etc), Alphabet (Google etc) and the like is the one that willingly opts into all ads.

Opus really is great. My first choice almost all the time.

Except... I wanted to stream music across the network to another computer with the lowest possible latency. I didn't care about the extra quality you get from lossless (opus sounded fine to me at 96kbps), but chose flac because it has almost no latency. Opus (and for that matter, all other codecs) added noticeable latency.

Opus goes down to 5ms if you customize it, and there's no point in going any lower for Internet streaming because 5ms of raw audio data is only 480 bytes at 16-bit 48khz (plus lots of UDP packet overhead). For latency-sensitive local streaming things are a lot different, and most of your latency will be from buffering to compensate for network jitter.

I used Opus for a VOIP app and I was really impressed with it. It's all I would ever use for Internet audio.

I used the both opusenc and ffmpeg, and twiddled with the configuration endlessly for about a week.

Edit: I'll elaborate here. I went all sciencey and wrote a script to test all combinations of a range of encoder settings that included ranges of comp, max delay, frame size, bitrate etc. Multiplied out, that was a lot of different combinations. I ran the script (which took ages because I had to record the startup latency for each one and capture audio on the remote end, logging ffmpeg output etc etc). Sometimes a particular setting was so bad across 5 tests (with different values for other settings) that I'd scrap that whole batch to save time. The ranges of configurations that were promising were used "in real life" to play music in our living room. I narrowed them down to a few that were "usable", but I asked myself "if I bought this thing as a product, would I send it back?". The answer was "yes".

FWIW, the sender was a Macbook Pro 2015 and the receiver was a Raspberry pi 4B

TL;DR At or below 5ms it had reliability issues (drop-outs) that would occur randomly but frequently enough to ruin music listening experience (like, even a 1s drop per song is completely unacceptable). But it also seemed to stretch audio because the delay would increase over time. Even when I got a high quality (zero drop-out steam) with acceptable latency, the latency would slowly increase over time. So about 1 hour into listening, of I paused the source to answer a call, the decoder end would continue playing for 30+ seconds.

I realised that over a LAN, streaming raw audio technically works, with almost zero latency, so I knew that it was the encode/decode that introduced the latency. That's when I switched over to flac, and noticed an immediate improvement in initial latency, but also stopped having the increasing delay over time.

An Opus frame can be as short as 2.5 ms [1] (at this packet size the effect of network buffering can be pretty obvious), but I use a frame size of 20 ms anyway when capturing on Windows, since this is what `cpal` gives me.

[1] https://opus-codec.org/docs/opus_api-1.3.1/group__opus__enco...

There is a variant of Opus called Opus Custom which is intended to support ultra low latency applications. Might be worth looking into.

There is also this product by the Nullsoft guys, no idea if it's any good though: http://cockos.com/ninjam/

Something's wrong with your setup. Typical latency for opus is only ~20ms and you can get it lower than that by changing an encoder setting. FLAC is usually more than that, but it should be negligible regardless. Maybe you are trying to decode Opus on an extremely low power device?
Whenever I went for low latency, I noticed 2 things: 1. Actual latency was not what I set in the encoder because of frame sizes, chunk sizes, tcp Windows, decided buffering 2. Forcing super low latency caused drop-outs, which is worse for music than delays.

Overall I just found flac to be rock solid and lower latency without much tweak

How did you stream it, if I may ask? ffmpeg server and VLC/mplayer client or something similar?
Chrome has supported xHE-AAC since 2020 on Android, 2022 on macOS, and added support on Windows last week.
Interesting. What about Firefox? What about hardware acceleration?
It may just work in Firefox on OS with xHE-AAC support since they already rely on the system for AAC decoding IIRC. canPlayType/isTypeSupported probably indicate no support unless they've explicitly added support though.

I'm not aware of any hardware acceleration.

They specifically mention that one of the advantage for them is the loudness management, to avoid having video with widely different range when their user are scrolling through their timeline.

I think Opus also has loudness management, but they seem fairly specific about the one that xHE-AAC provide : > Instead of burning in a specific target level and dynamic range compression (DRC) profile during encoding, xHE-AAC allows us to leave the original audio characteristics untouched and delegate loudness management processing to the client via loudness metadata, for the optimal audio experience based on context.

So maybe the xHE-AAC does provide a tangible benefit in this department.

Well less bandwidth and storage is also good for users.

Less bandwidth. Some users pay for bandwidth. Or have a monthly cap. So less bandwidth is a monetary advantage for users. It also may decrease time to play.

Similar, but less ctitical, for storage.

One rule of thumb when switching codecs is "don't decrease quality". I buy that Meta did that. Which would result in a slight improvement in quality (of course sold as big snd noticable) while getting storage and bandwidth down (or allow for better quality in the bandwidth adaptive case).

That saud. Yes. OPUS! Although other aites got quite some bad press for low bitrate opus.

Why can’t it be supported in web browsers? Is there a technical limitation or it just hasn’t been done because only Meta uses it?
Patents. I presume that the Facebook and Instagram apps bundle their own decoder (which Meta paid) but you need to convince other companies (or wait for OS integration for Firefox) to include xHE-AAC.
you can decode image files on web using wasm... i think perf suffers a bit though. audio i don't know, that's probably harder. you would probably lose the benefit of loudness normalization iiuc.
When I set up my jellyfin server I learned that Firefox doesn't support h.265 and that there's no way to add codecs to the browser.

Does anyone know why it isn't possible to add codecs to the browser?

You need to pay for the necessary patents to include an H.265 decoder if you intend to distribute software in countries that care about silly things like software patents.

Mozilla famously relied on Cisco to gain h.264 playback, because the patent included a certain maximum total price you needed to pay after exceeding a certain amount of devices. Cisco exceeded that amount, so any additional devices they supported were practically free, and they released a free h.264 decoder plugin for Firefox. The pricing loophole has been removed out of h.265 so a Mozilla decoder would cost them a lot of money.

In theory you could add codecs to your browser just fine, but Mozilla has already indicated it's not planning on adding h.265 to their browser, focusing on AV1 support instead.

Perhaps Mozilla would welcome patches to allow forwarding h.265 streams to the OS for decoding, like Chrome does, but they're not going to put the effort in themselves. With hardware AV1 decoding finally on the rise, I don't think that's a bad decision necessarily, companies and websites opting for the patent ridden format over the open format knew what they were doing.

>Mozilla famously relied on Cisco to gain h.264 playback

No, they didn't. OpenH264 only supports Constrained Baseline profile, and is only useful for WebRTC. It's used system libraries/frameworks for H.264 video playback since long before OpenH264 came around.

I remember when I first got into this kind of stuff that you could just download a divx codec pack and put the .dlls somewhere in windows XP so that your programs could use them, why can't I do that with firefox and h.265?
Chrome quietly supports h265 decode as of version 107 https://chromestatus.com/feature/5186511939567616
What’s quiet about it? Seems like it was announced.
Because more often than not is has to be implemented on OS level, and no one wants to do that for many many reasons, and patents are probably the least of it.

For example, for most modern formats software-only encoding is very slow, and you really want to have hardware support for it [1]. But to add hardware support to increasingly complex and varied specs is not a decision you make lightly.

[1] Google is forcing companies to support its codecs in hardware on pain of removing Youtube e.g. https://arstechnica.com/gadgets/2021/04/roku-vs-google-part-...

People will often use the same bluetooth headphones to listen to music regardless of what kind of computer they’re using. I don’t think it’s safe to assume mobile devices have worse sound.
>This codec isn’t supported on the web. So it is Android and iOS only.

So it is supported on the web.

>This was probably done to reduce storage and bandwidth, not to improve audio quality.

Packing more information into the same amount of data is the literal definition of better quality.

>Phone speakers aren’t good enough to notice.

Maybe not, but phones can be connected to headphones and speakers which can. Especially if they connect via 3.5mm audio jack.

Built in sound level management sounds like a good enough reason
This change breaks video in stories on the website.
don't people wear headphones?
Looks like xHE-AAC beats opus on <32 kbps/s must be important for FB at their traffic scale

Topic: A Session In The Abyss: xHE-AAC vs OPUS at 12, 24 and 32 kbps (voice & music) (Read 16655 times) https://hydrogenaud.io/index.php/topic,120997.0.html

xHE-AAC vs HE-AAC v2 Audio Quality Comparison: War of the Low-Bitrate Codecs https://www.youtube.com/watch?v=74SsKOUHgvo

It's sort of bonkers that we've quietly gotten to the point that codecs for music are getting compared at the 32 kbps level, for full stereo. Audiophiles used to constantly be making comparisons between codecs to see what's superior, but that whole field seems to have died when Opus became very difficult to ABX at 128 kbps. There hasn't been a HydrogenAudio multiformat test in nearly a decade [1], and that one was 96 kbps.

These days even YouTube videos get transcoded to 128 kbps Opus, which is astonishing. Provided that the source is high enough quality, you can listen to transparent audio on the lowest-tier streaming platform.

[1] https://wiki.hydrogenaud.io/index.php?title=Hydrogenaudio_Li...

Yeah, opus is incredible for speech, but at those bitrates it gives music a kind of sibilant buzzing that I find really hard to listen to, whereas the muffled/underwater effect of mp3 and aac isn't much worse than just a lowpass filter.

What I found a few years ago was that explicitly lowpassing beforehand to match the other encoders (around 4-8kHz) gets rid of the buzzing. Apparently opus' threshold for automatically doing this is a lot lower than the other encoders; it keeps higher frequencies at the cost of more artifacts. I concluded that decimating or setting the encoder to 8kHz/"wideband" or lower was an improvement, with a similar resulting quality to AAC, but I still slightly preferred Fraunhofer at 12-28kbps.

By default, 12 and 24kbps are particularly bad; at 16k the quality goes up from going mono-only, and by 8k bandpassing is automatic.

>Looks like xHE-AAC beats opus on <32 kbps/s

to be fair that tester indicated only voice being ok, and everything else just bearable under condition you are consciously working under severe bw constrain "EDGE poor network is still enough to stream music without issue (in a tunnel, on mountain…)"

As far as I know, xHE-AAC has no advantages over Opus, however, Opus is free from patent licensing.
>xHE-AAC has no advantages over Opus

The main features focused on in this article are loudness management and adaptive bitrate audio. From what I can tell Opus supports neither of those things.

Opus does support adaptive bitrate audio and in basically the same way; with the reference encoder, you can use OPUS_SET_PREDICTION_DISABLED to create a stream access point.

Loudness management isn't really the purview of the decoder/compressed format itself; the article describes using MPEG-D DRC which is, at least in principle, independent of the choice of xHE-AAC vs. Opus. To be clear I have no idea how companies do dynamic range compression in practice; in the Web setting, maybe via stuff like https://developer.mozilla.org/en-US/docs/Web/API/DynamicsCom... etc.?

A specific kind of adaptive bitrate where you have pre-encoded "lanes" at different bitrates and you can switch seamlessly at certain points.

I'm not convinced the loudness metadata makes much difference, especially for the entity doing the encoding. Can it do compression at playback time too? That's sort of implied but I'm not sure.

> Can it do compression at playback time too

Yes, it has dynamic range control too.

Exhale - Open Source USAC / xHE-AAC encoder

https://gitlab.com/ecodis/exhale

Have fun downloading from Instagram or Facebook in the future. Besides VLC, what software can play this format? What does the licensing look like? Will it take 20 years until I can actually use it (at least patents weren't disneyfied)?
How long until the patents expire?
20 years.
Sooo... How is the loudness management feature not just a reimplementation of "mp3 replay gain" that was introduced over 20 years ago?
Anybody else noticed that after following the link, the back button does not indicate that there is a page to go back. This is on Firefox on Linux. Wtf?

[edited: punctuation]

Do you have the Facebook container extension?
Yes indeed. That might explain it.
great, thanks for using more patented crap.

    <img title="Audio stock hero">