Hacker News new | ask | show | jobs
by symstym 3207 days ago
I've spent quite a lot of time working with the Web Audio API, and I strongly agree with the author.

I got pretty deep into building a modular synthesis environment using it (https://github.com/rsimmons/plinth) before deciding that working within the constraints of the built-in nodes was ultimately futile.

Even building a well-behaved envelope generator (e.g. that handles retriggering correctly) is extremely tricky with what the API provides. How could such a basic use case have been overlooked? I made a library (https://github.com/rsimmons/fastidious-envelope-generator) to solve that problem, but it's silly to have to work around the API for basic use cases.

Ultimately we have to hold out for the AudioWorklet API (which itself seems potentially over-complicated) to finally get the ability to do "raw" output.

6 comments

> I strongly agree with the author.

I will second this. I wanted to make a live streaming playback feature using the API so I could remotely monitor an audio matrix/routing system that I have in the office.

The API has _zero_ provision for streaming MP3. You either load and playback a complete MP3 file or you get corrupted playback because the API simply won't maintain state between decoding calls.

What I ended up having to do was write a port of libMAD to JavaScript and then use that to produce a PCM stream, which I _could_ then convert into an AudioBuffer, attach a timer, and then send into the audio API for correct playback.

Which is an insane amount of work for a gaping oversight in a common use-case of the API, a simple flag in the browsers native decoder would've sufficed.

> The API has _zero_ provision for streaming MP3.

Did you look into Media Source Extensions[0,1]? Fetching and playing the various audio formats is a bit outside the purview of Web Audio. But you can feed streaming MSE into Web Audio. If I recall, you use Web Audio's `AudioContext.createMediaElementSource()` to use a (potentially chunked) MSE source with web audio, but it's been a while since I did this.

That said, Media Source Extensions (MSE) is only supported on relatively modern browsers (IE11+) but you should be able to use it to stream mp3 to the Web Audio API on supported browsers.

There's also a way to do this without using MSE for older browsers. See the 72lions repo below for an example[2]. It's a bit convoluted, but not as much work as your workaround. As described in the README of the 72lions proof-of-concept:

"The moment the first part is loaded then the playback starts immediately and it loads the second part. When the second part is loaded then then I create a new AudioBuffer by combining the old and the new, and I change the buffer of the AudioSourceNode with the new one. At that point I start playing again from the new AudioBuffer."

0. https://developer.mozilla.org/en-US/docs/Web/API/Media_Sourc...

1. http://dalecurtis.github.io/llama-demo/index.html

2. https://github.com/72lions/PlayingChunkedMP3-WebAudioAPI

> Fetching and playing the various audio formats is a bit outside the purview of Web Audio

Just looking at that clause makes me think perhaps the Web Audio API should have been called something else.

Can you imagine writing "fetching and displaying various image formats is a bit outside the purview of HTML"?

(I realize that's a bit apples 'n oranges.)

I think it's more like "fetching and displaying various image formats is outside the purview of HTML5 canvas".

If you want to just show an image, you use an <img> tag, or just play an audio file you use <audio>. Canvas and the Web Audio APIs are for pages that want to make or mix their own images/audio. Though to be fair, html/javascript do make it easy to load image data from an image tag directly into a canvas; maybe there's a missing parallel for audio.

If I recall, as we did that project a year and a half ago, MSE either wasn't available at that time or the latency was entirely unacceptable. I should have noted that with the setup I described above we are able to achieve <150ms of latency in most cases; and as the system also allows remote control of matrix sources and mixers, the low latency was required in order to accurately manipulate the system under certain working conditions.
And I'll third it.

The MP3 issues don't end there, which is something the article touches on obliquely: you can't reuse many of the important constructs you might want to.

Here's my use case. I have a couple of games (https://arcade.ly/games/starcastle, https://arcade.ly/games/asteroids), each of which has three pieces of music: title screen, in game, and game over. If you play the game a couple of times you're going to hear the title screen audio probably once, in game twice or more (because it loops from the beginning after every playthrough), and game over twice. To put it simply: I need to play the same MP3s multiple times each.

To play an MP3 you have to decode it, which is an expensive operation. Firstly it takes time to decode - enough time that the user will notice the lag even on a fast machine. However the main problem is the amount of memory use: decoding takes you from a couple of MB of compressed MP3 to potentially hundreds of MB of uncompressed audio. The problem worsens for multiple tracks.

I discovered the memory issues via Chrome Task Manager, when I noticed my page using hundreds of MB of native memory, and traced this usage back to the music. You can often get away with this when running on a desktop browser, but not so much on mobile.

You can mitigate the memory issue to some extent by dropping the sample rate of your uncompressed PCM audio to 22.05KHz, which obviously halves its uncompressed size. Quality starts to suffer too much for music if you go much below this though. (Note here that I'm talking about the uncompressed sample rate, and NOT the MP3 bitrate. A 44.1KHz MP3 encoded at 64Kbps and one encoded at 128Kbps will decompress to the same size, although the 64Kbps version will obviously sound worse because more information will have been lost.)

But the inability to reuse a source buffer, which holds compressed audio, is absolutely aggravating, and something I've posted at length about here: https://github.com/WebAudio/web-audio-api/issues/1175. The reason you might want to do this is because it means you're only using as much memory as the compressed audio takes up and (hopefully) the rest will have been freed by the browser's runtime (no guarantees, obviously).

The downside of this approach is that you can't start a piece of music at a defined instant, which is extremely frustrating when you might want to synchronise it with events happening on screen.

Also, due to the re-decoding every time, and the asynchronous nature of such, I've now introduced a weird bug where it's possible to end up with both title and in game music playing at the same time if the user starts the game before decoding the title music is complete. It's fixable (although I haven't had time yet), but it's just one more irritation with a poorly designed API.

I'm actually thinking of going back to using the good old HTML5 AUDIO element just for playing music, since it seems a bit more reliable, but I need to do some experimentation to see what the memory impact is. I also had issues with AUDIO misbehaving quite badly in Firefox with multiple sounds playing simultaneously.

Sound effects are less of an issue because they're obviously quite short and therefore don't take an excessive amount of memory even when uncompressed, so I can at least keep buffer sources around for them. Nonetheless the API's excessive complexity shows through even here: why is it such a drama just to play a sound? Why do I need to create and connect a bunch of objects together just to play a single sound at a given volume? Ridiculous. Asinine.

I tend to think of the Web Audio API as the answer to the question: "how much of an audio API can you have if you stipulate that all user-specified code must run in the UI thread?".

Within that constraint I don't think it's a terrible API, but it's a big constraint and naturally raw access would be far preferable.

Yes.. after I wrote my comment I was feeling a bit bad for sounding like I was just trashing the API. In a world where JS is slow and there is no worker thread machinery, yet you need low latency and flexible processing, the design makes more sense.

That being said, the AudioParam "automation" methods still make me want to cry.

Yeah, AudioParam's refusal to interpolate anything makes it really hairy to work with.

Comments on the spec suggest that there was something really complicated about the "cancelAndHold" method (which I guess is still in NYI limbo), but I can't for the life of me figure out what it was.

"how much of an audio API can you have if you stipulate that all user-specified code must run in the UI thread?"

I just threw up in my mouth a little :/

The API is frustrating because it is meant to hide the fact that Android audio sucks giant hairy donkey balls.

If you give Web developers access to raw samples, they are going to expect it to work. When it doesn't on Chrome on Android, lots of people are going to start complaining and filing bugs.

So, instead of fixing the audio path, they decided to bury its crappiness under a "higher-level" API which has fuzzier latency and can be built with hacks in the audio driver stacks themselves.

Android audio is truly terrible for instrument apps. I don't understand how it suffices for things like games. I also don't understand why people even bother to make things like pianos and drum set… The latency is so extreme and inconsistent that even on recent phones they are useless. In contrast, iOS has had excellently playable instruments at least as far back as the iPod Touch 4.
Here's an interesting video on this topic back from 2013: https://youtube.com/watch?v=d3kfEeMZ65c
That... explains why Google was so keen to kill off the Audio Data API Mozilla proposed, I guess.
But Chrome for Android didn't come out until 2012 and Chris Rogers started the Web Audio work in 2009. I think someone would have had to have been exceptionally farsighted to think "Android's audio stack is going to suck for several years so we need to design around that now".
At that point in time, though, you could be forgiven with "Sheesh. Javascript is so painfully slow that nobody will ever pass PCM samples around in it."

So, at every point in time up to and including now, you've always got something resisting low-latency PCM. Android is just the latest reason.

Side note: it looks like Chris Rogers bowed out of Web Audio about 2012/2013 timeframe.

You couldn't really, since AudioData worked pretty well.

Also, it was obvious JS perf would get better and better.

Ah, good point. I should have checked better on things before commenting!
This has been a known issue with Android since at least 2009 (~2,700 stars); work done to address this is starting to trickle out this year.

https://issuetracker.google.com/issues/36908622

AAudio is a new C API. It is designed for high-performance audio applications that require low latency. It is currently in the Android O developer preview and not ready for production use. (Jun 2017)

There are not a lot of artists at Google.

Until this changes, the media apis will lag as G attempts to maintain parity with other orgs. Google makes product for Google devs and incidentally for the world to use.

Aha, I had that eerie feeling that I saw those crappy patterns somewhere else. So, Android it was
I think there were alternative designs but either no one cared or people were determined to push WebAudio despite faults. An example alternative that was proposed is http://robert.ocallahan.org/2012/01/mediastreams-processing-...
(https://github.com/rsimmons/plinth)

This demo doesn't keep a straight 120 BPM on my machine, it's incapable of holding the rhythm after 10 seconds of playback ( I tried the first patch on the left , Edge browser).

That's unfortunate. I don't have a machine running Edge to test it on. It uses less than 20% of the CPU in Chrome on my 2012 Macbook Air. It's possible that it's a problem with my code, but in general the Web Audio API does not have very good cross-browser support.
Damn, I just played with the Plinth demo for an hour, lost total track of time. Great stuff.
Thanks! After a certain point it felt like a dead end to me, so I dropped it in favor of exploring something more along the lines of a JS-based Max/MSP. But it is surprising how much fun can be had with the small number of modules available in Plinth.