Hacker News new | ask | show | jobs
by ssfrr 711 days ago
Definitely good to keep in mind. The thing that I think is really interesting about audio programming is that you need to be deterministically fast. If your DSP callback executes in 1ms 99.99% of the time but sometimes takes 10ms, you’re hosed.

I would love to see a modern take on the real-world risk of various operations that are technically nondeterministic. I wouldn’t be surprised if there are cases where the risk of >1ms latency is like 1e-30, and dogmatically following this advice might be overkill.

6 comments

> dogmatically following this advice might be overkill

It depends on your appetite for risk and the cost of failure.

A big part of the problem is that general purpose computing systems (operating systems and hardware) are not engineered as real-time systems and there are rarely vendor guarantees with respect to real-time behavior. Under such circumstances, my position is that you need to code defensively. For example, if your operating system memory allocator does not guarantee a worst-case bound on execution time, do not use it in a real-time context.

I don't mean to devalue the advice here. I think it's spot on, and I unreservedly recommend this article to folks who want to learn about writing reliable audio software.

I think in essence I'm repeating the comments of Justin from Cockos, which you summarize [1]:

> It is basically saying that you can reduce the risk of priority inversion to the point where the probability is too low to worry about.

In that comment you also say:

> 100% certainty can’t be guaranteed without a hard real-time OS. However 5ms is now considered a relatively high latency setting in pro/prosumer audio circles

Which I interpret as acknowledging that we're already forced into the regime of establishing an acceptable level of risk.

My point is that I would love to see more data on the actual latency distributions we can expect, so that we can make more informed risk assessments. For example, I know that not all `std::atomic` operations are lock-free, but when the critical section is so small, is it really a problem in practice? I want histograms!

[1]: http://www.rossbencina.com/code/real-time-audio-programming-...

> If your DSP callback executes in 1ms 99.99% of the time but sometimes takes 10ms, you’re hosed.

I tend to agree, but...

From my recollection of using Zoom-- it has this bizarre but workable recovery method for network interruptions. Either the server or the client keeps some amount of the last input audio in a buffer. Then if the server detects connection problems at time 't', it grabs the buffer from t - 1 seconds all the way until the server detects better connectivity. Then it starts a race condition, playing back that amount of the buffer to all clients at something like 1.5 speed. From what I remember, this algo typically wins the race and saves the client from having to repeat themselves.

That's not happening inside a DSP routine. But my point is that some clever engineer(s) at Zoom realized that missing deadlines in audio delivery does not necessarily mean "hosed." I'm also going to rankly speculate that every other video conferencing tool hard-coupled missing deadlines with "hosed," and that's why Zoom is the only one where I've ever experienced the benefit of that feature.

The context for this article is writing pro audio software, where that kind of distortion would generally be as bad as a dropout, if not worse.
Yeah, 5ms is the threshold for noticeability as far as latency in pro-audio. Its like frame-rate for pro-gamers. The problem is your target user is highly specialized out side the the norms by a large margin. What makes audio even more difficult is that sub ms issues can cause phase and frequency distortion that can become even more noticeable than latency alone.
1. you do not need to be a highly specialized target user to detect latency between pressing a key on a MIDI keyboard and the corresponding sound being produced.

2. 3ms is typical in-air latency between a typical DAW user and their near-field monitors, so claims about sensitivity to times much lower than 5msec should be taken with some skepticism

3. In live contexts, many drum + bass pairings have more than 10ms of air latency between them, so ditto #2

4. On the other hand, no good reason to add to latency

5. For performance purposes, jitter is much worse than latency. Pipe organ players rapidly learn to deal with even whole seconds of latency, but almost nobody can deal with jitter (essentially, variable, unpredictable latency)

6. There are no sub-ms issues that will cause phase and frequency distortion. Those come from DSP errors, not handling of latency, which is just about always a constant, fixed feature of the data signal path. You may be thinking of stuff like comb filtering, but this is not related to the latency in the signal path in a correct setup.

The "MIDI timing" problem was often a combination of MIDI traffic limitations with limited CPU in the receiver.

What started off as a four note chord would be smeared out a little by MIDI, especially in the early days until everyone worked out that putting MIDI for an entire studio down a single cable was a bad idea.

Then you'd get some more smearing in the target synth CPU as the incoming notes were parsed. Then perhaps some more delay for each notes, because it took a while to send trigger and pitch messages to the hardware. Even more if there were if there were software envelopes involved and they had to be initialised.

This is still a problem with VSTs, on a smaller scale. There's some finite amount of processing that has to be done before sound starts being generated. Usually it's not very much, but there's always the possibility that two notes that should start in the same 5ms buffer slot will be spread across two of them because one note is just a little too late.

This isn't as objectionable as glitching, but it can still affect the timing feel, and - depending on the patch design - cause phasing effects between the notes.

1. MIDI traffic limitations are rarely the issue. The chord smearing that some people claim to be able to hear is not because of traffic but because the protocol is a serialized stream of individual note on/note off messages, and thus by definition there is no possible way for every message to arrive at the same time. However, the actual delays between a set of note on messages caused by the protocol is small enough that it is in the same range as human performance on both keyboards and string instruments. Note that MIDI has no collision detection or ACK-style replies, and you do not use "a single cable" for MIDI unless you have only 1 sender and 1 receiver. If it is a DAW sending "a lot" of MIDI to some external MIDI hardware, the only issues arise if the total amount of data to be sent exceeds the serial capacity of the hardware layer. This is not impossible to make happen, but even so-called black MIDI faces a challenge when doing this, even with classic (DIN) serial MIDI.

2. "parsing incoming notes" does not cause more smearing. Block-sized processing of audio causes a delay which is the "performance latency" that people complain about. It does not change the ordering or interval between note onsets.

3. the "finite amount of processing that has to be done before sound starts being generated" is irrelevant in a block processing architecture (which is used these days by all DAWs and all plugin APIs). As long as the plugin gets its work done within the time represented by the block,there is no additional latency caused by the plugin. If it doesn't, then there's a click anyway.

4. "there's always the possibility that two notes that should start in the same 5ms buffer slot will be spread across two of them". No, there isn't, If that happens, that's a coding error in either the plugin host or the plugin or both. But also, time is continuous. If the notes are supposed to be 3msec apart, it doesn't matter if they are 3msec apart within the same buffer/process cycle, or in two consecutive ones.

I don't know. When it comes to real-time audio... imagine a huge festival with a giant wall of speakers blasting at the audience. If the audio playback glitches and you something like a 22kHz buzz (alternating two samples), that is a lot of fried ears.
This scenario is the stuff of nightmares for me!

When you have 100k people paying $500 to the sky is the limit, failure is not an option. Increasingly audio engineers and subsequently performers are at the mercy of the latest jr developers who don’t have to live with the failures of their short sightedness. Grimes’ Coachella set case in point. Wholly due to pioneer ignoring their users for over a decade. Sometimes we don’t have 3 days to copy files to a usb drive but I digress.

Apparently, failure was an option. Just not a very popular one.
Grimes failure was still pleasant when you compare it to the mayhem you get when the DSP inside the amplifier system glitches for a few samples.

What do you think happens a dense crowd of 500+ people suddenly starts to have excruciating ear pain?

Is this something that happens often or are you simply speculating?
My understanding is that in practice, for very large shows, electronic musicians have fully redundant computer setups running in parallel and some hardware that will switch over instantly if one fails.

For example, here is one rig:

https://www.reddit.com/r/ableton/comments/7y2u3o/ableton_mai...

It uses a Radial SW8 to automatically switch between the redundant machines if one flakes out:

https://www.radialeng.com/product/sw8

If failure is not an option you bring 2 computers to every gig, burn CDs, and bring your vinyls.

Grimes is a "dj" that does not understand the software. Fixin that problem is one fucking click on the interface.

Heh, I thought it was odd you referenced a ten year old show, but I guess she made a similar mistake twice. Her 2014 Coachella set was a total mess.
But you'll never be 100% sure. Most musicians aren't willing to pay for NASA-level QA and custom hardware running an RTOS, and even that doesn't guarantee perfect software.

We're always dealing with risk and trade-offs. Maybe you avoid a locking `atomic` synchronization point by implementing a more complicated lock-free ringbuffer, but in the process you introduce some other bug that has you dumping uninitialized memory into the DAC.

I think the advice in TFA is totally reasonable and worth following. I'm just saying that there may be cases where it's OK to violate some of these rules. I'd love to see more data to help inform those decisions.

This isn't even in opposition to the article, which says explicitly:

>Some low-level audio libraries such as JACK or CoreAudio use these techniques internally, but you need to be sure you know what you’re doing, that you understand your thread priorities and the exact scheduler behavior on each target operating system (and OS kernel version). Don’t extrapolate or make assumptions

Off topic but does TFA still anyways mean "The Fucking Article"? In my personal understanding it came from people telling others to "read tfa". But to see the term used ubiquitously referring to "the article" but keeping the profanity just seems kinda strange to me. We could say something like "TA" and omit the "fucking" but maybe it actually means something completely different and my personal lore has just detached from the zeitgeist
I think it does mean "the fucking article", but I also think a lot of people use it as "the featured article". I agree with you though, it's a bit confusing sometimes as the less nice usage is still also common hahaha.
The real fun is optimising maths. Remove all divisions. Create LUTs, approximations, CPU specific tricks. Despite the fact CPUs are magnitudes faster now, they are still slow for real time processing.
Real time does not mean fast, it means deterministic

Thus such micro optimizations are seldomly used. Quite the opposite, you try to avoid jitter which could be the result of caches

While real-time does not mean fast, micro optimisations are frequently used. No one likes slow DSP audio software.
> No one likes slow DSP audio software.

And then there's Diva at its highest output quality setting... :)

Yes, I did think twice about posting that precisely because of Diva.
Jitter does not matter if deadlines are met. It only matters if it can cause deadlines to be missed (sometimes).
If you have a buffer that's being clocked out and your goal is to keep data flowing, the jitter is going to influence how small your buffer can be. Let's say you're producing 56Khz audio, the best you can do is produce a [sample] exactly at that frequency. If you have 1ms jitter now you need a 1ms buffer so you have delay. If jitter is small enough, like 0.1ns jitter in some SIMD calculation, then for all intent and purpose it doesn't matter for an audio application...
You've just restated my point. If the deadlines are met, jitter doesn't matter. Ergo, you can't meet deadlines if your jitter is too large. Otherwise, it doesn't matter.
Wouldn't the deadline be now+zero for real time audio applications? If I'm building a guitar pedal (random example) ideally I want no delay from the input to the output. Any digital delay makes things strictly worse and so any jitter matters. That said, the difference between zero and very close to zero does become a moot point given small enough values for any practical purpose.
Basically "It doesn't matter when it doesn't matter".
> Create LUTs

This has been slower for most things that raw computation for well over a decade (probably more like two).

If there are complex equations involved, it absolutely is faster. You can also create intermediate LUTs, so the tables are small and fit in cache and then do interpolation on the fly.
Not at all, when you work with DSP even nowdays using LUTs is very common and usually faster.

You are not saving a sin table, but very complex differential equations.

Yeah, isn’t hitting memory (especially if it can’t fit in L1-2 cache) one of the biggest sources of latency? Especially that on modern CPUs it is almost impossible to max out the arithmetic units, outside of microbenchmarks?
You don't really do these any more on a modern CPU. This is stuff I used to do 30 years ago and you might still do if you're on a micro-controller or some other tiny system. The CPUs aren't slow. Tne main problem is if the OS doesn't schedule your process it doesn't matter how fast the CPU is.
This is great fun! But it's much more prevalent in embedded DSP than desktop.
> deterministically fast

Indeed, like all real-time systems you need to think in terms of worst-case time complexity, not amortized complexity.

Use of Ethernet in real-time systems. Packet loss, collision rate, jitter is """good enough""" so it became an acceptable replacement of eg. ATM.
Yes. Most modern Ethernet isn't running on shared media (i.e. there are no collisions) and for the most part no packet loss as long as there's no congestion. For networks and for the CPU, when you're fast enough the jitter matters less, if the cpu or the network "takes a break" (from the application perspective), it tends to be a very short break on really fast networks or cpus. e.g. if a packet gets in front of you in 10Mbps Ethernet that's a big deal for an audio application but a packet ahead of you in 10Gbps Ethernet isn't much of a delay for audio. 1ms vs. 1us sort of thing.

[fixed typo]

Or you use AVB/TSN which gives you stronger guarantees, but requires cooperation of all bridges (switches).