Hacker News new | ask | show | jobs
by munificent 2236 days ago
I'm not an expert on machine learning or DSP, but I do know just enough of each to suspect this isn't anywhere near as impressive as it seems.

A distortion pedal is essentially just a waveshaper [1]. Think of audio in digital terms as just a series of numbers. A waveshaper is just a simple mathematical function. To apply it, you literally just apply the function to each value in the input stream and there's your output stream. There's no memory or interesting algorithms going on. It's the audio equivalent to calling map() on your list of samples with some lambda to produce a new list of samples.

Of course distortion pedals do that in the analogue domain using circuitry, which has some additional complexity because transistors and diodes and friends don't behave exactly like mathematical functions. There's "sag" and some other physical effects that cause the output to also somewhat depend on previous input.

Even so, that can generally be modelled using a simple convolution. Each output sample is calculated by taking some finite number of previous input samples, multiplying each of them by a weight factor, and then summing the results.

Does that sound like a neural net? It is. That's what we call them convolutional neural networks. Convolution is bread and butter in DSP. You can easily generate one that produces the same effect as some piece of hardware or acoustic environment by running an impulse (a single 1.0 sample surrounded by silence) through the system and then recording the result. That "impulse response" essentially is your set of convolution weights.

So using a deep neural network and then training sounds a lot to me like overkill to me. You could accomplish much the same by using a "depth-1 network" and running an impulse through it.

Caveat, though: I am just a novice here, so there could very well be a lot of subtlety I'm missing out on.

[1]: https://en.wikipedia.org/wiki/Waveshaper

6 comments

I believe you are are vastly oversimplifying this.

An impulse response will characterize only a system that is

* linear

* time-invariant

Many effects are not linear (especially distortion: the crunchiness comes from the nonlinearity). f(a) + f(b) != f(a+b)

And many effects are time varying, for example phasers and choruses which have low frequency oscillators controlling how the sound is shaped depending on when it comes in. Chorus for example will vary the pitch up and down.

Yup! This covers the basics of control theory; a simple concept that most don't understand.
From a certain point of view, modern deep neural networks for audio are 'just' nonlinear adaptive filters on steriods.

Linear adaptive filters have been around for a long long time, and nowadays are everywhere. They can't capture the nonlinear behavior of effect pedals, not even just the waveshaper.

The model you are describing sounds like a 'wiener model,' which refers to a linear filter followed by some nonlinearity (i.e. the waveshaper).

There are other approaches to nonlinear adaptive filters, like Volterra series and kernel methods.

People have been using all of these techniques, and more, to approximate analog audio effects for decades.

A 'trained deep neural network' is not in principle that much different or 'less pure' than other nonlinear adaptive filtering techniques, just with a load more parameters. What matters is if the results are sufficiently improved to justify the computation.

I think the real innovation here is that this was done on just a few minutes of training data, opening up the possibility for all kinds of effects / amps to be modeled through this same method somewhat easily. I'm not sure how current DSPs are designed, but this is likely orders of magnitude more simple than designing the audio transformations (digital or analog) manually.
I think you're hand waving away all the complexity. You're right that distortion is pretty much waveshaping. But all the nuance, "warmth" and lovely non-linearities that make these pedals highly sought after is the really really hard part. It can't be simply solved with convolution.

The same pedal from this post has been pain stakingly circuit modeled by Cytomic[1] over the past few years and still isn't out of beta. Analog circuit modeling is a huge thing in DSP right now because it's the closest we have to proper 1:1 software clones of analog hardware. But it's incredibly time consuming.

I'm really excited by this use of WaveNet. It could drastically cut down the time to clone old costly to maintain hardware. But it will have some way to go before you can tweak the parameters in realtime. Or so I assume?

[1]: https://cytomic.com/#plugins

Also not an expert, but that sounds about right to me.

I imagine the difficulty in designing these models comes from modeling the variable factors, IE the parameters normally controlled by the knobs on the amp or effect. Some of these should be straightforward (for example "gain" increasing the volume on the input signal), but I suspect that in some pedals these parameters changing can have impacts on how other parameters behave. I don't see any mention of how this "deep learning" model works with that.

Guitar modeling gear has been around for about 25 years (The first Line6 amp debuted in 1996, I'm not sure if their were earlier products brought to market). They've been derided by purists, but have kind of turned a corner in recent years and are now becoming very mainstream.

Some modern products, such as those sold by Kemper, actually allow you to plug in to your existing gear and generate a profile based on the impulse response. The results, at least according to the reviews I've read, are actually very impressive.

> You could accomplish much the same by using a "depth-1 network" and running an impulse through it

This would be true for a linear impulse response, however for this kind of effects you need both state/memory (like a convolution) and non-linearity (like a waveshaper), which is why people use RNN's and CNN's

Ah, good point. Thank you for mentioning non-linearity. This has helped clarify my novice thinking on this.