Hacker News new | ask | show | jobs
by probablybetter 702 days ago
The "mess" of Linux audio is due to ONE reason: single-client ALSA driver model.

every other layer is a coping mechanism and the plurality and divergence of the FOSS community responds in various ways: - Jack - PulseAudio - PipeWire

I am unclear why Jaroslav Kyocera chose to make ALSA single-client, but Apples CoreAudio multi-client driver model is the right way to do digital audio on general-purpose computing devices running multi-tasking OS'es on application processors, in my opinion.

Current issues this article does not address that actually constitute large parts of the "mess" of Linux Audio:

- channel mapping that is not transparent nor clearly assigned anywhere in userspace. (aka, why does my computer insist that my multi-input pro-audio interface is a surround-sound interface? I don't WANT high-pass-filters on the primary L/R pair of channels. I am not USING a subwoofer. WTF)

- the lack of a STANDARD for channel-mapping, vs the Alsa config standards, /etc/asound.conf etc.

- the lack of friendly nomenclature on hardware inputs/outputs for DAW software, whether on the ALSA layer, or some sound-server layer. (not to mention that ALSA calls an 8-channel audio-interface "4 stereo devices")

- probably more, but I can't remember. My current audio production systems have the DAW software directly opening an ALSA device. I cannot listen to audio elsewhere until I quit my DAW. This works and I can set my latency as low as the hardware will allow it.

this is the thing: more than about 10ms latency is unacceptable for audio recording in the multitrack fashion, as one does.

6 comments

> The "mess" of Linux audio is due to ONE reason: single-client ALSA driver model.

This is one of the major reasons why Linux accessibility sucks IMO.

Audio is one thing that you need to "just work™" if you want to get accessibility right, as there's no way for a screen reader user to fix it without having working audio in the first place[1]. On Linux, it does not "just work", and different screen readers have different ideas on how they want audio to be handled. In particular, the terminal Speakup screen reader (with a softsynth) wants exclusive control of your device through ALSA IIRC, while the Orca screen reader for the GUI goes through Pulse. That makes it impossible to use both of them at the same time.

[1] Well, you can sort of fix it by having a second machine and SSHing into the broken one, but that's not what I mean.

> the terminal Speakup screen reader (with a softsynth) wants exclusive control of your device through ALSA

If you have Pulseaudio or Pipewire, they add a plugin to ALSA library that reroutes audio to audio daemon, so ALSA applications should work correctly.

I would be surprised if Orca did use Pulse directly, it uses speech-dispatcher (IIRC) which then uses PA if configured that way.

Also, Accessibility != Audio. I, for instance, use Braille only. No need for speech synthesis. So equating Accessibility issues wth the crazy audio stack is a little bit too simple.

I mean... I've never seen a single audio issue on Linux. It does "just work" in my experience. I realize the people citing issues in this thread aren't just making shit up for the fun of it, but I think there's a lot of going too far and saying it sucks for everyone when it seems to work just fine for most.
I disagree.

Applications want to receive/provide a stream (X sample-rate, Y sample format, Z channels) and have it routed to the right destination, that probably is not configured with the same parameters. Having all applications responsible for handling this conversion is not doable. Having the kernel handle this conversion is not a good idea. The routing decision-making needs to be implemented somewhere as well. Let's not ignore the complexity involved in format negotiation as well.

The scenario of a DAW (pro-audio usage) is too specific to generalise from that. That is the only kind of software that really cares about codec configuration, latencies and picking its own routing (or rather to let the user pick routing from the DAW GUI).

> I am unclear why Jaroslav Kyocera chose to make ALSA single-client, but Apples CoreAudio multi-client driver model is the right way to do digital audio on general-purpose computing devices running multi-tasking OS'es on application processors, in my opinion.

Because ALSA is a different layer in the audio stack than CoreAudio.

ALSA corresponds to MacOS drivers and I/O Kit.

CoreAudio (Audio Toolbox / Audio Unit) corresponds to Pipewire / Pulseaudio.

But on the Mac side everyone is OK with using CoreAudio (with the accompanying set of daemons), while on Linux, for some reason, everyone wants to go as low-level as possible, "just open the device file" and is wondering, why something is missing. Because you skipped that, that's why.

> current audio production systems have the DAW software directly opening an ALSA device.

I mean, I remember this being the case for a very long time on windows with ASIO too, which is the only reasonable way to run a DAW with acceptable latency there. MacOS has multi-client but I was never able to get latency as low as fine-tuned windows and Linux systems, and in the end that's what matters - you just use your motherboard's chip for OS audio and your pro soundcard for the actual workload. Pipewire is very close to giving a good experience but there'll always be some overhead - I'm making some art installations running various chains of audio effects on a raspberry pi zero and the difference between going through pipewire even if my app (https://ossia.io) is the only process doing any sound, and going straight to ALSA, is night and day in terms of "how many reverbs I can stupidly chain before I hear a crack".

My presonus interface allows multiple applications to access it over ASIO simultaneously, while letting regular Windows audio through, at 16 samples of latency. ASIO does not mandate exclusive access, bad drivers do.
Single-client model is not bad because it doesn't require kernel to do the mixing, sample rate conversion and they can be moved to userspace (which Windows does these days as well [1]). The less code in kernel, the better.

[1] https://learn.microsoft.com/en-us/windows/win32/coreaudio/us...

Is this planned to be addressed/fixed? (single-client model) Maybe there were previous attempts?
No, because there's nothing to fix (at the system side).

Apps should use the right API from the right layer; when they skip something, no wonder they will miss whatever the skipped layer provides. When they do not need exclusive access to the device and want to play nice with the other apps, they should use pipewire/pulseaudio.

For 99% of apps, using ALSA directly is the wrong approach. You don't use IOKit directly in Mac apps either.

Pipewire/Pulseaudio install a plugin for ALSA library so that ALSA applications audio is rerouted to audio daemon. So apps using ALSA can work at both systems with and without an audio daemon.
At this day and age there should not be a system without an audio daemon. At least not one, that is not broken. Apps should not certainly accommodate for broken systems, and forcing workarounds for the correct ones.