Hacker News new | ask | show | jobs
by akira2501 3207 days ago
> I strongly agree with the author.

I will second this. I wanted to make a live streaming playback feature using the API so I could remotely monitor an audio matrix/routing system that I have in the office.

The API has _zero_ provision for streaming MP3. You either load and playback a complete MP3 file or you get corrupted playback because the API simply won't maintain state between decoding calls.

What I ended up having to do was write a port of libMAD to JavaScript and then use that to produce a PCM stream, which I _could_ then convert into an AudioBuffer, attach a timer, and then send into the audio API for correct playback.

Which is an insane amount of work for a gaping oversight in a common use-case of the API, a simple flag in the browsers native decoder would've sufficed.

2 comments

> The API has _zero_ provision for streaming MP3.

Did you look into Media Source Extensions[0,1]? Fetching and playing the various audio formats is a bit outside the purview of Web Audio. But you can feed streaming MSE into Web Audio. If I recall, you use Web Audio's `AudioContext.createMediaElementSource()` to use a (potentially chunked) MSE source with web audio, but it's been a while since I did this.

That said, Media Source Extensions (MSE) is only supported on relatively modern browsers (IE11+) but you should be able to use it to stream mp3 to the Web Audio API on supported browsers.

There's also a way to do this without using MSE for older browsers. See the 72lions repo below for an example[2]. It's a bit convoluted, but not as much work as your workaround. As described in the README of the 72lions proof-of-concept:

"The moment the first part is loaded then the playback starts immediately and it loads the second part. When the second part is loaded then then I create a new AudioBuffer by combining the old and the new, and I change the buffer of the AudioSourceNode with the new one. At that point I start playing again from the new AudioBuffer."

0. https://developer.mozilla.org/en-US/docs/Web/API/Media_Sourc...

1. http://dalecurtis.github.io/llama-demo/index.html

2. https://github.com/72lions/PlayingChunkedMP3-WebAudioAPI

> Fetching and playing the various audio formats is a bit outside the purview of Web Audio

Just looking at that clause makes me think perhaps the Web Audio API should have been called something else.

Can you imagine writing "fetching and displaying various image formats is a bit outside the purview of HTML"?

(I realize that's a bit apples 'n oranges.)

I think it's more like "fetching and displaying various image formats is outside the purview of HTML5 canvas".

If you want to just show an image, you use an <img> tag, or just play an audio file you use <audio>. Canvas and the Web Audio APIs are for pages that want to make or mix their own images/audio. Though to be fair, html/javascript do make it easy to load image data from an image tag directly into a canvas; maybe there's a missing parallel for audio.

If I recall, as we did that project a year and a half ago, MSE either wasn't available at that time or the latency was entirely unacceptable. I should have noted that with the setup I described above we are able to achieve <150ms of latency in most cases; and as the system also allows remote control of matrix sources and mixers, the low latency was required in order to accurately manipulate the system under certain working conditions.
And I'll third it.

The MP3 issues don't end there, which is something the article touches on obliquely: you can't reuse many of the important constructs you might want to.

Here's my use case. I have a couple of games (https://arcade.ly/games/starcastle, https://arcade.ly/games/asteroids), each of which has three pieces of music: title screen, in game, and game over. If you play the game a couple of times you're going to hear the title screen audio probably once, in game twice or more (because it loops from the beginning after every playthrough), and game over twice. To put it simply: I need to play the same MP3s multiple times each.

To play an MP3 you have to decode it, which is an expensive operation. Firstly it takes time to decode - enough time that the user will notice the lag even on a fast machine. However the main problem is the amount of memory use: decoding takes you from a couple of MB of compressed MP3 to potentially hundreds of MB of uncompressed audio. The problem worsens for multiple tracks.

I discovered the memory issues via Chrome Task Manager, when I noticed my page using hundreds of MB of native memory, and traced this usage back to the music. You can often get away with this when running on a desktop browser, but not so much on mobile.

You can mitigate the memory issue to some extent by dropping the sample rate of your uncompressed PCM audio to 22.05KHz, which obviously halves its uncompressed size. Quality starts to suffer too much for music if you go much below this though. (Note here that I'm talking about the uncompressed sample rate, and NOT the MP3 bitrate. A 44.1KHz MP3 encoded at 64Kbps and one encoded at 128Kbps will decompress to the same size, although the 64Kbps version will obviously sound worse because more information will have been lost.)

But the inability to reuse a source buffer, which holds compressed audio, is absolutely aggravating, and something I've posted at length about here: https://github.com/WebAudio/web-audio-api/issues/1175. The reason you might want to do this is because it means you're only using as much memory as the compressed audio takes up and (hopefully) the rest will have been freed by the browser's runtime (no guarantees, obviously).

The downside of this approach is that you can't start a piece of music at a defined instant, which is extremely frustrating when you might want to synchronise it with events happening on screen.

Also, due to the re-decoding every time, and the asynchronous nature of such, I've now introduced a weird bug where it's possible to end up with both title and in game music playing at the same time if the user starts the game before decoding the title music is complete. It's fixable (although I haven't had time yet), but it's just one more irritation with a poorly designed API.

I'm actually thinking of going back to using the good old HTML5 AUDIO element just for playing music, since it seems a bit more reliable, but I need to do some experimentation to see what the memory impact is. I also had issues with AUDIO misbehaving quite badly in Firefox with multiple sounds playing simultaneously.

Sound effects are less of an issue because they're obviously quite short and therefore don't take an excessive amount of memory even when uncompressed, so I can at least keep buffer sources around for them. Nonetheless the API's excessive complexity shows through even here: why is it such a drama just to play a sound? Why do I need to create and connect a bunch of objects together just to play a single sound at a given volume? Ridiculous. Asinine.