Hacker News new | ask | show | jobs
by diminish 3207 days ago
>> Can the ridiculous overeagerness of Web Audio be reversed? Can we bring back a simple “play audio” API

To be frank, graphics world had some type of standard (OpenGL) long time ago, next to DirectX. So WebGL had a good example. However in the audio world we haven't seen a cross platform quasi-standard spec covering Mac, Linux and Windows. So IMHO, non-web audio lacks also common standards for mixing, sound engineering, music-making. That's why web audio appears to lack a use case. IMHO, that smells opportunity.

I use Web Audio, in canvas-WebGL based games where music making is needed. I understand the issues - we definitely need more than "play" functionality.

4 comments

We've been through this multiple times. WASAPI, MME, DirectSound on Windows. CoreAudio on Mac. Libraries like SDL_mixer, FMOD, Wwise. We know how to construct a sound API. There's 20 years of prior art.

If you provide a low-level "play" API, others can build stuff on top because it's just numbers. Sure, sometimes there's "expensive numbers" like MP3 decoders, FFTs, etc., but these can be added as needed.

It's fairy easy to get PCM out on any one platform (which means you can build support for Win/Mac/Linux by writing that small C code 3 times), and as Jasper_ noted, the rest is just math on some integers or floats, so there is nothing much platform specific about it.

I think the bigger issue is that non-experts sometimes get tasked with adding support for things.

The "audio device API that leaves the sample rate completely unspecified" example is, believe it or not, one I've seen before elsewhere. And yet, if you know the first thing about PCM samples, you know this is a mind-numbingly stupid mistake to make. Yet it's a mistake that a few people have made into shipping products, because they can't or won't reason about audio, and this did not stop them from being in charge of an audio API.

Whether the API could be used to play MOD files is a good litmus test of its suitability for a variety of purposes. Covers repeatedly playing samples at differing volumes and pitches, simultaneously.

I'd rather have a comprehensive API that someone can dumb down than one that's so crippled as to be unusable beyond very basic functionality.

A FastTracker 2 player was discussed on HN quite some time ago (spoiler: uses ScriptProcessorNode): https://news.ycombinator.com/item?id=10538791
> However in the audio world we haven't seen a cross platform quasi-standard spec covering Mac, Linux and Windows.

OpenAL: https://www.openal.org/

Don't let the name fool you. OpenAL is a closed-source library, much like Wwise or FMOD or PortAudio, that just implements playback. Bizarrely enough, it is also the only one of these APIs that uses a similar "play this buffer" approach and suffers from the same issues as Web Audio's memory management, just without a GC.

The actual audio equivalent to OpenGL is OpenSL [0], which I don't think picked up any support from anybody.

[0] https://www.khronos.org/opensles/

PortAudio is MIT-licensed[0] and seems like a decent example of the primitives you need for audio.

Broadly low-level audio APIs are divided into 2 categories:

1. Callback-based - every time the underlying system has a new block of audio available and/or needs to be supplied with a new block, it calls your callback, which reads input data, does whatever processing you want, and writes output data

2. Stream-based - Inputs and Outputs are represented by streams. You can read from the input stream to record and write to the output stream to play back.

Both types of API can be used for low-latency audio, but you generally introduce a buffer of latency when you need to convert between them.

Portaudio lets applications choose which API they want to use.

[0]http://www.portaudio.com/license.html

OpenAL has multiple implementations, including the popular open source OpenAL Soft. They are not all closed source.

OpenAL does have a recording API so it isn't pure playback only.

But you are right in that the OpenAL scope is fairly limited. It was designed for games, particularly for rapid and frequent playback of simultaneous short sound effects. Because of this, the memory management issues you bring up are not often an issue. You load all the buffers you need at the beginning of the level and you keep reusing them without any more memory allocation/deallocation.

OpenSL ES was adopted by Android in 2.3 (API 9). However, they just recently seemed to invent yet another API, and seem to be leaving OpenSL behind.

> OpenAL is a closed-source library

So are all official OpenGL implementations (MESA isn't official, last I heard). Doesn't stop them from being a standard and being used, although I agree they would be better if they were open source.

OpenGL has to talk to hardware and is implemented by the hardware vendor. OpenAL does not have multiple implementations, isn't provided by hardware vendors and just wraps the platform audio API.
I'm not sure of your point here. Not sure why multiple implementations is a benefit for users. Audio is generic and hardware is cheap enough that the operating systems just implement and include drivers. It's a cross platform library that meets audio needs, including 3D/spatial audio, much like (and designed like) OpenGL.