| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by datenwolf 3589 days ago

PulseAudio is not a replacement for ALSA. At the moment ALSA is the only kernel level API for talking with the hardware drivers that's fully supported by upstream (= the kernel devs).

You _can_ install OSS4 if you want to, at the expense of loosing proper power management support; not a big issue on Desktops, but it can reduce a laptops battery runtime significantly.

Either way PulseAudo does not know how to talk to audio hardware by itself (well, it sort of does as far as Bluetooth attached devices is concerned, but the whole Linux Bluetooth stack "BlueZ" is quite finicky on its own). It needs (just like Jack, esd, aRTs, Xaudio* and all the other sound servers out there (*= dead and burried)) some way to talk to the hardware. And that, at the moment usually is ALSA.

When ALSA was first developed the original plan was to implement all the must haves (mixing, resampling) though the userspace module support. Unfortunately it ended up in an unholy mess and never worked satisfyingly. To anyone who's spent considerable amount of time developing audio software (in hindsight) that's not a big surprise. Audio raises very hard and tight deadlines. The way ALSA dmix implements mixing through IPC mechanisms is extremely prone to buffer underruns because of some participating process not getty enough CPU cycles in time to complete to write out the next bunch of audio.

PA does plaster over the biggest cracks in the foundations of the Linux low level audio infrastructure. And given the circumstances it does an admirable job there; if it doesn't break things even more. But it's only treating the symptoms and not curing the underlying issues.

Here's the laundry list of what's required to fix the major bumps in the road (which also PA has to navigate):

- major simplification and cleanup of the kernel side driver model. Too many things are left optional for the drivers to implement

- tight requirements on how kernel drivers must behave (there's a lot of variation in the runtime behavior of certain drivers; some return immediately from read write functions, others have significant delay, timestamps reported may be increment coarsely based on a per-buffer-length base, others deliver smooth playhead/recordhead position updates)

- mixing / fanning _must_ happen in or close to the kernel (the reason for that is explained in the next point). The biggest issue here is that resampling may be required. As long as it's upsampling it's rather unproblematic actually (the only DSP challenge there is not creating artifact frequencies). Downsampling is a much harder problem, because it involves lowpass filtering, i.e. discarding information and the challenge is only to affect the stuff above the Nyquist frequency.

- the audio infrastructure must be able to reschedule processes based on the deadlines and time left until new samples are required. Audio is probably the most demanding of the soft realtime applications. A single missed sample is easily noticeable even to the most untrained pairs of ears. If your buffers underrun you get pops or crackles. However on an interactively used system acceptable audio timing mismatches are in the order of 5ms before an untrained brain is able to pick them up. Raising audio applications to realtime priority alleviates the issues, but the better solution is to keep statistics on the audio frame lengths read/write by a process, how much time it takes for a process to prepare the next frame and use that to augment the scheduling. However this raises the issue being able to circumvent priority privileges through the audio system; a mitigation would be that frame sizes are quantized to a minimum frame size that correponds to the amout of CPU time the process would be alotted in regular scheduling (that's a tough one and I've been strugling with it for some time).

When it comes to OSs with a monolithic kernel a very good question is, if the last two parts absolutely have to happen in the kernel or if it's possible to have them in userspace. If we want to implement it in userspace, then, because of the rescheduling issues this will require the addition of userspace based rescheduling capabilities the syscalls required for this.

Of course with a proper audio infrastructure present through standard kernel and/or driver interfaces the need for a system like PulseAudio vanishes; at least as far as mixing and resampling are concerned. It's still desireable to have an audio manager process that knows how to decide on which device to play certain audio (telephone calls go to the headset, music goes to the speakers and so on).