Hacker News new | ask | show | jobs
by michaelrmmiller 1638 days ago
To add onto this very good explanation from Paul, in practice, audio processing graphs always rapidly decay into serial processing after starting out massively parallel. Most of the time, we are writing to a single output (i.e. speakers via your audio interface or a file). On top of that, some of the heaviest weight DSP happens on this final output stage (mastering chains with multiband compressors, linear phase equalizers, limiters etc.) So every 5.3ms (256 sample buffer @ 48kHz sample rate) we start massively parallel processing all the leaf nodes (audio tracks with sound files, virtual instruments and synths) and end up bottlenecking as the tree collapses into a line. Then we are stuck doing some of the most CPU intensive work on a single core since we can’t process the limiter DSP plug-in until the EQ finishes its work, for example.

We need to meet our real-time deadline or risk dropping buffers and making nasty pops and clicks. That mastering stage can pretty easily be the limiting (hah) step that causes us to miss the deadline, even if we processed hundreds of tracks in parallel moments before in less time.

The plug-in APIs (AudioUnits, VSTs, AAX) which are responsible for all the DSP and virtual instruments are also designed to process synchronously. Some plug-ins implement their own threading under the hood but this can often get in the way of the host application’s real-time processing. On top of that, because the API isn’t designed to be asynchronous, the host’s processing thread is tied up waiting for the completed result from the plug-in before it can move on.

Add on that many DSP algorithms are time-dependent. You can’t chop up the sample buffer into N different parts and process them independently. The result for sample i+1 depends on processing sample i first.