| Given what I've seen in audio ML research: 1) Tuning hyperparameters of your audio preprocessing is a pain if it's a preprocessed CPU step. You have to redo preprocessing every time you want to tune your audio feature hyperparams 2) It's quite common to use torchaudio spectrograms, etc. purely because they are faster (I can link to a handful of recent high-impact audio ML github repos if you like) 3) If you use nnAudio, you can actually backprop the STFT or mel filters and tune them if you like. With that said, this is not so commonplace. 4) Sometimes the audio is GENERATED by a GPU. For example, in a neural vocoder, you decode the audio from a mel to a waveform. Then, you compute the loss over the true versus predict audio mel spectrograms. You can't do this with these C++ features. (Again, I can link a handful of recent high-impact audio ML github repos if you like.) Again, I just don't get it. |
The point is, ship it.
Seriously, nobody is lugging a GPU around to interact with their most frequently used micro-computing platform, their headphones, which right now, already represent a new and extraordinary era of "accelerated component" market expansion.
The 7 microphones in your earpiece, and the 6 speakers pushing air into your head, are not quite as close to the GPU, as they need to be, perhaps .. but they already have a DSP, and there is already a silicon battle going on among the vendors.
>You can't do this with these C++ features.
Yes, and I think the point in the end, is to use AI to write better C++ code, and design better, cheaper, smarter silicon, as always (and actually ship it) ..