Hacker News new | ask | show | jobs
by Mizza 2091 days ago
I'm very eager to see GPU acceleration make its way into audio production, which is all still heavily CPU bound.

A Free GPUFFT implementation will certainly help! Great work.

2 comments

https://en.wikipedia.org/wiki/AMD_TrueAudio I believe AMD did that, but there is little to no softwares actually make use of it.
It's not gonna happen, audio is much less throughput intensive but a lot more latency sensitive.
You can read off a GPU in 10us, which is just a single sample at 96khz.

If your entire stack lived in the GPU, and you're just reading out the result, this is trivial.

If you're constantly copying buffers back and forth because some effects are implemented in the CPU and some in the GPU, not so much!

It's probably the case that a full stack GPU implementation would blow what we have out of the water, but you'd lose your entire ecosystem in the process, so it's probably never going to happen.

Sony is trying some stuff along those lines with the PS5. They have one compute unit on the GPU with a few features fused off that is dedicated to audio.
What's the latency for integrated GPUs?
Crystalwell had a shared CPU/GPU L4 cache with ~50ns latency. I don't think there was a programming model where you could bounce data back and forth that fast, but I don't see a reason why the hardware wouldn't be capable of it.
I would think a GPU might help if you have a lot of audio channels and a lot of effects on each channel.

But even if that is not the case, machine learning is making its way into music production tools more and more. No doubt a beefy GPU will be useful to a lot of music production professionals in the future at least, as the tools they are using begin to leverage ML more and more.

Why do you think it's not going to happen? And for which use case?

The time budget to refresh a video frame is 8ms on 120HZ if everything else came free. In practice closer to <4ms. So even looking at the close to worst conditions, that's about the delay of the sound traveling a meter - should be fine for a lot of real life applications.

Audio processing is real-time, which means that you cannot miss your deadline. If you do miss you get audible glitches, whereas in graphics you just get a slowdown. For that reason, audio code is written in a very particular, real-time safe style that avoids locks, allocations, syscalls, and anything else that is not guaranteed to return within a bounded amount time.

How long the deadline is depends on your buffer size and sample rate. To my ear, buffer sizes of >128 samples (at a sample rate of 44.1 KHz) have detectable latency (although the amount of latency will depend also on how many applications are in your signal chain). At 128 samples you have just under 3ms to do your processing.

Also note that for graphics, the output is the GPU itself. So you don't need to wait for the output to move back to the CPU, it's already where it needs to be.

But let's keep some realistic context. The audio output is under 1 MBps. You can push that much over original PCI (not express) and ~1ms delay on PCI was "everything must be broken, reset the whole bus". Pushing audio samples both ways on PCIe will not be an issue.

https://www.cycfi.com/2019/04/gpu-dsp-latency/

> PCI-E 3.0 standard guarantees data transfer for 4 kb data with 1-2 μsec (3-10 round trip).

Copying CPU-GPU-CPU:

> size: 8192 bytes, time: 4.72 us,

This should not be a meaningful impact in any audio workflow.

What OS are we talking? It sounds like you need a real time os.
MacOS has the ability to prioritize audio threads to avoid scheduling misses [0]. Linux has a set of patches [1] to give it soft real-time capabilities. I don't know as much about Windows, but I assume it has similar capabilities.

And it's not like this is ABS firmware, where someone might die if you miss your deadline, and where hard real-time OSes are used. But you do get glitches in the audio stream, which in performance contexts is still pretty bad.

[0] https://developer.apple.com/documentation/audiotoolbox/workg... [1] https://wiki.archlinux.org/index.php/Professional_audio#Real...

What are you talking about ? Most people working with sound today do so by using a DAW like Ableton, QBase or Fruity Loop that work on Windows or sometimes Mac Os
Comment before was saying you cannot miss a deadline; i didnt know this was possible with windows or macos.
Could it be possible to “prerender” the audio on the GPU when it’s not being worked on (say, a track not being edited)? Then just play that track if it’s not edited before the user hits play?
This is classic way of reducing CPU usage, just bounce a part of a track to raw audio an play it back so it doesn't need to render in real time. A GPU doesn't really change the equation there.

There are some methods of synthesis which rely on FFT which can't really be done well in real-time with the CPU (PadSynth, PaulStretch) that I'm hoping this will help with.

I've heard credible claims that GPUs these days (esp. TPUs) have lower latency for big models than CPUs. I haven't really investigated, but I could see it happening if you give the TPU a huge L1 cache or something.
Perhaps for large calculations? Otherwise the PCI transfer delay would be a big latency hit?
Yeah until TPUs can directly communicate with the sound card, it sounds slow.