Hacker News new | ask | show | jobs
by svantana 2236 days ago
End-to-end modelling is very enticing for the lazy engineer, unfortunately parameter control (knobs) are an important feature of most audio effects, and sampling enough of the parameter space will become prohibitive for more complex effects. That's why the traditional approach is divide-and-conquer.

Also, I don't think this approach won't work well with time-varying effects such as chorus, although I'm happy to be proven wrong.

3 comments

Not saying that it is in any way practical or useful in the real world, but I think there are approaches which are more geared towards what we might understand as 'emulating' rather than modelling an effect. It seems that emulators can be learned from data with surprising efficiency [1]. These would be amenable to parameter control.

[1] https://arxiv.org/abs/2001.08055

Sadly, I believe you will be proven correct.

What that neural network learns is basically an approximation of a static impulse response. So while it can simulate linear time-invariant effects such as reverb quite nicely, it'll surely have issues with chorus.

Reverb is time invariant? You can set custom decay time, rate etc, so the one not can be heard for say, 10 seconds if you want to go full Devin Townsend. I'd think Chorus would work better.

I wanted to do a very similar project, but with an overdrive. Let's see if I get time anytime soon!

>Reverb is time invariant?

You might want to familiarize yourself with [0]. Time-invariance is a specific property of a system, where the output (for any given input) has no dependency on if the input signal happens now or 1 second from now or 100 years from now (except for the corresponding delay). Most reverb models are, to a first approximation, time invariant, because the effect will have the same sound for the same guitar line, no matter when you play the line.

Chorus, on the other hand, has a (perhaps subtle) modulator to get that warbly (scientific word!) sound. It doesn't feel like a time-based effect, but it certainly is and that makes it quite a lot more difficult to mimic with a system that (as others have noted) boils down to an impulse response.

[0]https://en.wikipedia.org/wiki/Time-invariant_system

Impulse response reverbs (Bricasti, etc) are based on time-invariant convolution.

Studio reverbs famously aren't, and some of the most popular models (notably Lexicon) have included time-variant algorithms since the late 70s. The processing power to handle IR convolution didn't exist, and it turned out some time variation added lushness and density to the sound that simpler models couldn't capture.

Modelling a chorus or time-variant reverb with any form of convolution - including any convolution-based neural net - is a complete waste of time, because most chorus algos are trivial and convolution is completely the wrong tool for the job.

It's literally about as useful as taking a still picture of a 90 minute movie.

Thanks for the info! Did some reading and I can see plain reverb being time invariant indeed :) I never realised chorus pedals did more than just stack frequency offsets onto your signal, but I only really play distorted so choruses are of limited use to me.

Pasting the other response below as well:

> Ah righto, the reverb pedal I'm most familiar with turns out to not be just reverb - EQD Afterneath does a whole bunch of funky stuff. Plain reverb though, yeah. I was approaching this more from the angle of training a neural network, where the input and output waves have to be correlated over a great span of time/ samples.

Reverb is indeed linear time invariant (sans some rarer internal modulation techniques) but it's quite a high order filter.
Ah righto, the reverb pedal I'm most familiar with turns out to not be just reverb - EQD Afterneath does a whole bunch of funky stuff. Plain reverb though, yeah. I was approaching this more from the angle of training a neural network, where the input and output waves have to be correlated over a great span of time/ samples.
Then what you need is a multi dimensional matrix of impulse responses, one for each combination of parameter. You can further simplify the model by limiting to only useful combination ranges etc...
> What that neural network learns is basically an approximation of a static impulse response.

Im curious, do you have a reference or source for this? Distortion is non-linear making it impractical to model using an impulse response. Is there something about neural networks that makes them good for modeling non-linear but time-invariant effects?

Even without parameterisation, it might be interesting as a "make my guitar sound like Jimmy Page" kind of tool.

Like you said, it will most likely have limitations, but it's still one more tool in the belt, regardless.

You're right in the "this is one more tool in the belt" sense but there are modelers like the Kemper and Fractal already out there that make your guitar sound like... anyone... and they are really convincing. I'd argue this is almost a solved problem. Still cool, nonetheless.
Building the model is not solved though. Kemper sort of does that by not sure what, but an approach that simply measures the effect and creates a complete model in hours would radically change the industry. Companies like Yamaha (line6) would be able to add hundreds of simulations in months instead of a couple.
There's an Italian company already doing something like that for offboard gear: Acustica Audio. But they're using convolution instead of neural networks. It's instant instead of taking hours.

An acquaintance that builds boutique studio gear had some of his creation modeled by them, and we were quite impressed.

https://www.acustica-audio.com/store/en

> make my guitar sound like...

That isn't reasonable. There are too many variables beyond the effect, like room, fingers, guitar, and amp. Without the knobs, you haven't delivered the effect.