Pretty cool, though I wonder what the latency of this would be if used as a plugin?
The author says it works in real-time, but to non music/audio folks this could mean '100 ms latency is real-time enough, right?'
Generally, I think the audio VST business is a really fun space to be in for a lifestyle business, as it is way too small to be attractive for VCs. It seems like a space that provides many niches for lots of small players to thrive in.
As an aside, it's really quite interesting that a lot of cutting edge tech is now used to emulate the hardware-based tech of yesteryear. Think film filters for photoshop, and about 90% of all audio plugins that emulate high end hardware, compressors, pedals, etc etc.
I know of a few shops that took VC money. The big problem isn't the market size so much as how slow the market moves. The product lifetime of a plugin is around a decade. And users hate subscriptions. And it's really hard to determine the value you add to your customers. And no one wants to pay you.
It's basically a terrible place to be a developer in it for the money. Really fun work otherwise. The cool gigs are the ones where you build custom plugins for someone's crazy idea.
In consumer applications, plugins are used all the time for prototyping before you go to hardware. MATLAB is way too slow for anything useful.
The success of splice would disagree with your notion that “users hate subscriptions”. Given the horrendous price point of many of these plugins it seems to be perfect for a subscription based model. To me it always seemed there is more of a pushback from the industry producing vsts than from the consumers.
Splice's numbers aren't public so I can't comment on their success. Avid's are, and they had a terrible quarter - and they're the poster child (alongside Adobe) for subscription licensing in creative software. But I'd be interested to see what the breakdown in revenue is for plugin licenses versus preset/sample packs (bit of a blade & razor model there).
The price points really aren't horrendous if you consider how expensive the engineering is, how little demand there is, and how long you need to maintain a product. You aren't being ripped off by spending a couple hundred bucks on a plugin. I think we'll end up at a place where everything is a subscription, but I can tell you from experience that it creates friction for the users.
Agreed. The business model seems to be to give access to the rent to own deals via the sample subscription fee. Don’t think they make any money of their plugin deals. I’m also not arguing it’s too expensive or a rip off. But it’s still a large amount of money for software, in the private space at least. The rent to own thing seems like a smart tool to get rid of the barrier of entry.
Do solo or small shop vst plugin developers make any money?
I’m curious if anyone has any direct knowledge about that.
There are so many professional activities similar to that where no one makes any money and people really just do it for the love, and then there are seemingly similar things like that where people make surprisingly large amounts of money.
I'm fairly new to the game, but I'm a solo developer. Currently I dont make enough to quit my day job, but it is a nice supplementary income, and it's nice to get paid a bit for something I truly enjoy.
There are also several solo/small shop developers that do make a living from selling plug-ins. Here are a few that I can think of off the top of my head.
Steve Duda, the developer of Serum is kind of the poster child for this. He contracts out for pieces of the synth (UI design, resampler, filters), but he's mostly a one-man shop and, as I understand it, Serum pays the bills.
It's hard to tell how much Duda is an outlier, though, and how many other people could succesfully follow his path.
I was in talks with a (new-style) 'label' that sells samples, sound packs, and VST plugins. Some of their plugins have been purchased 25k times.
One of the things I've also heard from labels is that not only there's money in the VST world (it's also very crowded, piracy is rampant as noted, etc.), a lot of plugins are ported over to iOS and are sold as "virtual pedals". The number of sales and revenue there was noted as being very interesting.
When I had an active band, our guitarist went from bringing his amp to rehearsal, to having a bunch of pedals, to having a digital pedal board, to having an iPhone with some sort of tiny adapter.
I made fun of him and we wouldn't have trusted it to be used live, but damn it worked impressively well
They do. Strezov sampling is one guy. Serum is one guy. Chris Heinz is one guy, etc. etc.
But you have to be willing to put in the time and make phenomenal products, because no one wants average instruments and effects, we can get those for free.
Quite a few small developers in this space. It's not like indie gaming, but there's also less competition.
I think you need to be a musician/producer to be successful here though.
Steve Duda wrote Serum, probably the most popular synth plugin in modern electronic music. everyone I know has a license. so "yes", with the caveat that it's difficult to actually create products of this level of quality
AFAIK, Mike Schuffham (www.scuffhamamps.com) earns a living developing and selling S-Gear. It might be a semi-retirement or lifestyle type living - not sure - but he's been doing it over a decade now. He doesn't charge as much as he could and gives away free updates for far too long. Despite being a (mostly at least) solo effort, its widely regarded as being a top-tier amp sim. I personally think it sounds better than both Helix and Bias, which are both heavily bank-rolled outfits.
It doesn't have their breadth, but the tones it does have are nearly as good as it gets without serious air movement.
There's latency and there's the somewhat separate question of how much time is needed to make a prediction. Wavenet is causal (no look-ahead) and operates on the sample level so there are no buffers and thus no latency in the strict sense, beyond encoding/decoding into the sample rate and format required by the ML model, which should take <1ms.
Whether a model manages to make a prediction in that amount of time depends on things like the receptive field and number of layers. The linked paper says their custom implementation runs at 1.1x real-time. I guess this isn't impossible; their receptive field is ~40ms, vs. 300 for the original (notoriously slow) wavenet, and the model is likely to have less layers and channels.
"Round trip," or guitar to processing to speakers needs to be sub 10ms to be transparent to the musician. Source: spent years playing guitar through my guitar -> DAC -> PC -> DAC -> speaker signal chain
That's not what real time, means though. Real time processing means taking signals as they come in, and outputting the transformed result such that there is as close to no signal lag as possible. The output can in fact be wildly lower or higher resolution, real-time does not particularly say anything about that. It's all about whether the output plays (for practical purposes) at the perceived "same time" as the input signal. There will always be some delay, but that delay can't get perceivable, and for obvious reasons there can't be any (significant) buffering.
Is that your private definition of "real-time"? I think it is common to define real-time processing by a specified, finite time between input and output. Many real-time processes are concerned more with the consistency of the latency than with its absolute value.
Latency is much more noticeable when you’re playing a musical instrument; 25-30ms is the point at which it becomes distracting in my (anecdotal) experience as a keyboardist. 50ms would be literally unplayable —- I cannot keep in time if latency is that severe. And that’s total output latency from the moment a key is depressed to the moment the sound comes out the speakers, so it’s important for every component in the signal chain to have the lowest possible latency. A bunch of 5-10ms delays adds up really quickly.
I think "rate" in the parent comment was just referring to speed, not sample rate. But yes, latency is critical for anything used during recording or performance. However way back when I used to make my own music I used non-realtime plugins sometimes and it was okay.
Real-time has a few slightly different meanings. So it's hard to say what the author means.
One meaning is just that you can guarantee specific deadlines. So if your programme can react within an hour guaranteed, that would be real-time. (Though usually we are talking about tighter deadlines, like what's needed to make ABS brakes work.)
For 'real time' music usage you wouldn't need strict guarantees, but something that's usually fast enough.
Implementing a VST plugin is literally the exact definition of requiring strict latency guarantees. Your comment winds through a lot of unrelated comparisons to ultimately not make any sense.
“Usually fast enough” are three words that guarantee failure in a live show/MIDI environment, which is a large use case of VST and its peers beyond production. By extension, “usually fast enough” further guarantees nobody will ever use your software. That’s noticeable right away.
The question isn’t about compsci real-time theorycrafting, it’s “here’s a buffer of samples, if you don’t give it back in a dozen milliseconds the entire show collapses.” That’s pretty clearly meant by “real time“ contextually.
"Usually fast enough" is unfortunately the only guarantee a preemptive multitasking OS can give you. Unless your system is guaranteeing your program x cycles of uninterrupted processing per frame of audio and you can consistently process the frame in that amount of cycles, the only mitigation is to deliver frames in large enough chunks that you never run out of time in practice under agreeable circumstances.
That said, I agree that the question of what "real-time" might mean is irrelevant given the context.
It is completely irrelevant, given the context. The only, only, only thing real-time means here is “can be run on a live signal passing through it” rather than “is a slow, offline effect for a DAW”. No hard real-time, no soft real-time, no QNX, no pulling out the college compsci textbook. There IS real-time in that sense in DSP, it just isn’t in a VST plugin.
I’ll repeat again that any compsci theorycrafting is not the concern here, and real-time has a very specific meaning in DSP. Computer science does not own the concept of real-time, and the only people tripping over the terminology are those with more compsci experience than DSP. I appreciate everyone trying to explain this to me, but (a) I understand both, and (b) this is like saying “no, Captain, a vector could mean anything like a mathematical collection, air traffic control should learn a thing or two from mathematics.”
Just to be perfectly clear here because I'm not sure you're just using my post as a soapbox or if you have misunderstood my argument: I agree that it's clear what real-time means in this context. I disagree that "usually fast enough" guarantees failure for a VST, because in the case of VST, "usually fast enough" is the only guarantee the host operating system will offer your software.
It's not "theorycrafting" to say that real-time music software running in a preemptive multitasking operating system without deterministic process time allocation will have to suffer the possibility of occasional drops. It happens in practice and audio drivers have to be implemented to account for the bulk of it, and the VST API is designed in such a way that failure to fill a buffer on time needn't be fatal.
Not to mention if the inference is done on the CPU, it shouldn't be that hard to control it. The matrices are of a set size by the time you're running a VST; this is the actual simple answer.
The medium answer is "this is a wavenet model, so inference is probably really expensive unless the continuous output is a huge improvement to performance".
Indeed. Having myself spent some time in the "VST lifestyle business" when I was in grad school (was selling a guitar emulation based on physical modelling synthesis), and now working in ML, I think there's no chance for such an approach to hit "mainstream" anytime soon. Even if you do your inference on CPU, most deep learning libraries are designed for throughput, not latency. In a VST plugin environment, you're also only one of the many components requiring computation, so your computational requirements better be low.
You might be able to combine it with the recent work on minimizing models to obtain something that is small enough to run reliably in real time.
Although the unusual structure of the net here may mean you're doing original and possibly publication-level work to adapt that stuff to this net structure.
If you were really interested in this, there could also be some profit in minimizing the model and then figuring out how to replicate it in a non-neural net way. Direct study of the resulting net may be profitable.
(I'm not in the ML field. I haven't seen anyone report this but I may just not be seeing it. But I'd be intrigued to see the result of running the size reduction on the net, running training on that network, then seeing if maybe you can reduce the resulting network again, then training that, and iterating until you either stop getting reduced sizes or the quality degrades too far. I've also wondered if there is something you could do to a net to encourage it not to have redundancies in it... although in this case the structure itself may do that job.)
I wonder if teddykoker has looked at applying FFTNet or similar methods as a replacement for Wavenet. I'm not sure but it seems to me like FFTNet is a lot more tractable than Wavenet, and not necessarily that much worse for equivalent training data.
No, the other guy is right. Technically the definition of real-time can have a lot of leeway. Here's the paper linked in the article. Note how the authors never define what they really mean by real-time. They even make statements like "runs 1.9 times faster than real-time". They certainly imply your definition, but there's plenty of wiggle room to say "Well technically, I wasn't lying"
If you drop an audio buffer and fire off a 22kHz impulse into a 50,000 watt soundsystem, you are going to have thousands of very unhappy people and likely some hearing damage.
Yes, it absolutely 100% will, depending on what you mean by handwaving “glitch”. VST is built into chains, and a flaky plugin will derail an entire performance, often making downstream plugins crash. I’m speaking from extensive experience writing plugins and performing with them in multiple hosts and trigger setups. It’s not a robust protocol, but it gets the job done.
Are you speaking from some experience with which I’m unfamiliar where it’s okay for DSP code to fail hourly? Trying to understand your viewpoint.
Agreed. If anyone wants to see some of the more successful DSP work being done today for pro or prosumer audio, I recommend checking out Strymon and Universal Audio products. Both make use of SHARC SoCs and achieve great results.
Are there any VST containers? Something that will wrap the VST, intercept under-runs or other bad behaviour and substitute some alternative signal (zero, passthrough, etc.). This could also be part of the host software.
The article and your comments inspired in me the idea of a wave-net based VST learning wrapper. If the real plugin fails, substitute a wave-net based simulation of the plugin.
End-to-end modelling is very enticing for the lazy engineer, unfortunately parameter control (knobs) are an important feature of most audio effects, and sampling enough of the parameter space will become prohibitive for more complex effects. That's why the traditional approach is divide-and-conquer.
Also, I don't think this approach won't work well with time-varying effects such as chorus, although I'm happy to be proven wrong.
Not saying that it is in any way practical or useful in the real world, but I think there are approaches which are more geared towards what we might understand as 'emulating' rather than modelling an effect. It seems that emulators can be learned from data with surprising efficiency [1]. These would be amenable to parameter control.
What that neural network learns is basically an approximation of a static impulse response. So while it can simulate linear time-invariant effects such as reverb quite nicely, it'll surely have issues with chorus.
Reverb is time invariant? You can set custom decay time, rate etc, so the one not can be heard for say, 10 seconds if you want to go full Devin Townsend. I'd think Chorus would work better.
I wanted to do a very similar project, but with an overdrive. Let's see if I get time anytime soon!
You might want to familiarize yourself with [0]. Time-invariance is a specific property of a system, where the output (for any given input) has no dependency on if the input signal happens now or 1 second from now or 100 years from now (except for the corresponding delay). Most reverb models are, to a first approximation, time invariant, because the effect will have the same sound for the same guitar line, no matter when you play the line.
Chorus, on the other hand, has a (perhaps subtle) modulator to get that warbly (scientific word!) sound. It doesn't feel like a time-based effect, but it certainly is and that makes it quite a lot more difficult to mimic with a system that (as others have noted) boils down to an impulse response.
Impulse response reverbs (Bricasti, etc) are based on time-invariant convolution.
Studio reverbs famously aren't, and some of the most popular models (notably Lexicon) have included time-variant algorithms since the late 70s. The processing power to handle IR convolution didn't exist, and it turned out some time variation added lushness and density to the sound that simpler models couldn't capture.
Modelling a chorus or time-variant reverb with any form of convolution - including any convolution-based neural net - is a complete waste of time, because most chorus algos are trivial and convolution is completely the wrong tool for the job.
It's literally about as useful as taking a still picture of a 90 minute movie.
Thanks for the info! Did some reading and I can see plain reverb being time invariant indeed :) I never realised chorus pedals did more than just stack frequency offsets onto your signal, but I only really play distorted so choruses are of limited use to me.
Pasting the other response below as well:
> Ah righto, the reverb pedal I'm most familiar with turns out to not be just reverb - EQD Afterneath does a whole bunch of funky stuff. Plain reverb though, yeah. I was approaching this more from the angle of training a neural network, where the input and output waves have to be correlated over a great span of time/ samples.
Ah righto, the reverb pedal I'm most familiar with turns out to not be just reverb - EQD Afterneath does a whole bunch of funky stuff. Plain reverb though, yeah. I was approaching this more from the angle of training a neural network, where the input and output waves have to be correlated over a great span of time/ samples.
Then what you need is a multi dimensional matrix of impulse responses, one for each combination of parameter. You can further simplify the model by limiting to only useful combination ranges etc...
> What that neural network learns is basically an approximation of a static impulse response.
Im curious, do you have a reference or source for this?
Distortion is non-linear making it impractical to model using an impulse response.
Is there something about neural networks that makes them good for modeling non-linear but time-invariant effects?
You're right in the "this is one more tool in the belt" sense but there are modelers like the Kemper and Fractal already out there that make your guitar sound like... anyone... and they are really convincing. I'd argue this is almost a solved problem. Still cool, nonetheless.
Building the model is not solved though. Kemper sort of does that by not sure what, but an approach that simply measures the effect and creates a complete model in hours would radically change the industry. Companies like Yamaha (line6) would be able to add hundreds of simulations in months instead of a couple.
There's an Italian company already doing something like that for offboard gear: Acustica Audio. But they're using convolution instead of neural networks. It's instant instead of taking hours.
An acquaintance that builds boutique studio gear had some of his creation modeled by them, and we were quite impressed.
That isn't reasonable. There are too many variables beyond the effect, like room, fingers, guitar, and amp. Without the knobs, you haven't delivered the effect.
This isn't bad, but the note decays sound noticeably different. My guess is that the NN doesn't know that human ears have non-linear response that makes them more sensitive to errors in the decay than the attack, so it treats them equivalently. If this is the case then it might be fixable by using logarithmic scale audio samples instead of linear.
The non-linearity of the ear is frequency dependent[0], but in practice I suspect it would be sufficient to pre-process the linear PCM data with x=sqrt(x) and undo before playback with x=x^2.
I came into the comments to say the same thing. To my ears, the NN versions roll off unnaturally at the end and that makes them really easy to identify as artificial.
I'm not an expert on machine learning or DSP, but I do know just enough of each to suspect this isn't anywhere near as impressive as it seems.
A distortion pedal is essentially just a waveshaper [1]. Think of audio in digital terms as just a series of numbers. A waveshaper is just a simple mathematical function. To apply it, you literally just apply the function to each value in the input stream and there's your output stream. There's no memory or interesting algorithms going on. It's the audio equivalent to calling map() on your list of samples with some lambda to produce a new list of samples.
Of course distortion pedals do that in the analogue domain using circuitry, which has some additional complexity because transistors and diodes and friends don't behave exactly like mathematical functions. There's "sag" and some other physical effects that cause the output to also somewhat depend on previous input.
Even so, that can generally be modelled using a simple convolution. Each output sample is calculated by taking some finite number of previous input samples, multiplying each of them by a weight factor, and then summing the results.
Does that sound like a neural net? It is. That's what we call them convolutional neural networks. Convolution is bread and butter in DSP. You can easily generate one that produces the same effect as some piece of hardware or acoustic environment by running an impulse (a single 1.0 sample surrounded by silence) through the system and then recording the result. That "impulse response" essentially is your set of convolution weights.
So using a deep neural network and then training sounds a lot to me like overkill to me. You could accomplish much the same by using a "depth-1 network" and running an impulse through it.
Caveat, though: I am just a novice here, so there could very well be a lot of subtlety I'm missing out on.
I believe you are are vastly oversimplifying this.
An impulse response will characterize only a system that is
* linear
* time-invariant
Many effects are not linear (especially distortion: the crunchiness comes from the nonlinearity). f(a) + f(b) != f(a+b)
And many effects are time varying, for example phasers and choruses which have low frequency oscillators controlling how the sound is shaped depending on when it comes in. Chorus for example will vary the pitch up and down.
From a certain point of view, modern deep neural networks for audio are 'just' nonlinear adaptive filters on steriods.
Linear adaptive filters have been around for a long long time, and nowadays are everywhere. They can't capture the nonlinear behavior of effect pedals, not even just the waveshaper.
The model you are describing sounds like a 'wiener model,' which refers to a linear filter followed by some nonlinearity (i.e. the waveshaper).
There are other approaches to nonlinear adaptive filters, like Volterra series and kernel methods.
People have been using all of these techniques, and more, to approximate analog audio effects for decades.
A 'trained deep neural network' is not in principle that much different or 'less pure' than other nonlinear adaptive filtering techniques, just with a load more parameters. What matters is if the results are sufficiently improved to justify the computation.
I think the real innovation here is that this was done on just a few minutes of training data, opening up the possibility for all kinds of effects / amps to be modeled through this same method somewhat easily. I'm not sure how current DSPs are designed, but this is likely orders of magnitude more simple than designing the audio transformations (digital or analog) manually.
I think you're hand waving away all the complexity. You're right that distortion is pretty much waveshaping. But all the nuance, "warmth" and lovely non-linearities that make these pedals highly sought after is the really really hard part. It can't be simply solved with convolution.
The same pedal from this post has been pain stakingly circuit modeled by Cytomic[1] over the past few years and still isn't out of beta. Analog circuit modeling is a huge thing in DSP right now because it's the closest we have to proper 1:1 software clones of analog hardware. But it's incredibly time consuming.
I'm really excited by this use of WaveNet. It could drastically cut down the time to clone old costly to maintain hardware. But it will have some way to go before you can tweak the parameters in realtime. Or so I assume?
Also not an expert, but that sounds about right to me.
I imagine the difficulty in designing these models comes from modeling the variable factors, IE the parameters normally controlled by the knobs on the amp or effect. Some of these should be straightforward (for example "gain" increasing the volume on the input signal), but I suspect that in some pedals these parameters changing can have impacts on how other parameters behave. I don't see any mention of how this "deep learning" model works with that.
Guitar modeling gear has been around for about 25 years (The first Line6 amp debuted in 1996, I'm not sure if their were earlier products brought to market). They've been derided by purists, but have kind of turned a corner in recent years and are now becoming very mainstream.
Some modern products, such as those sold by Kemper, actually allow you to plug in to your existing gear and generate a profile based on the impulse response. The results, at least according to the reviews I've read, are actually very impressive.
> You could accomplish much the same by using a "depth-1 network" and running an impulse through it
This would be true for a linear impulse response, however for this kind of effects you need both state/memory (like a convolution) and non-linearity (like a waveshaper), which is why people use RNN's and CNN's
> We find that the model is able to reproduce a sound nearly indistinguishable from the real analog pedal.
Maybe for the average person or buried in the mix, but the audio samples were easy to distinguish for me as a guitarist. The NN samples unnatural decay were a dead give away.
Yeah - this was clearly audible on my phone speakers, especially during more muddy / multi-note sequences.
While it may not be able to emulate a real pedal to create one’s own sound, it would be interesting / fun for amateurs when applied as a post-filter with an interface that says “make this sound like X famous incredible track” coming out of a stock guitar signal.
Not really a guitarist, but listening to them I couldn't hear a specific difference. Yet I still liked one of them more. And when I clicked "reveal" that one was the real one, turns out.
the real one has longer fading tones, the one generated by machine learning cuts the sound abruptly.
it seems easy for me to differentiate them and I’m a beginner with guitars (~1 month, so I’m your average Joe). it’s pretty good though, I’m sure it can be improved greatly.
Sounds great and I had to listen to both of the samples to guess correctly.
That being said the Tube Screamer is a somewhat simple effect: it's just a distortion with the clipping diodes moved to the feedback loop.
How possible would it be to get the famous A/B class amplifier voltage sag and associated changes in parameters of the whole amplifier, or in other words "will it chug"?
I think this would be very possible - there was quite a bit of discussion of using NN techniques for modelling fx discussed at DAFx2019 (http://dafx2019.bcu.ac.uk/). There are a number of papers discussing different techniques in the paper archive.
Many of the techniques discussed were variations on image processing - transforming the input to the frequency domain then converting this to an image, and applying standard techniques to transform the image, then back to the time domain. There are many compromises with this approach (loosing phase information for example) but with a suitable overlap/add the results were better than I expected, and certainly there's room for further investigation to see if there's useful stuff in there.
Another time domain approach that was applicable to your amplifier model question was an attempt to determine hidden variables in a circuit. Basically, the circuit under test is examined, and rather that build a spice model (which can be laborious) the technique was to expose the interval voltages following components with memory (so capacitors for example). These outputs were included in the NN training model, and so in effect the normally hidden internal state was exposed and allowed for a very good approximation.
Unfortunately not, it's been delayed. DAFx2020 was due to be in Vienna, and i'm assuming they are still planning on being there, but it's scheduled to be in 2021.
It's a great conference, well worth attending. It's heavy on the maths, but that's DSP for you!
"many purists argue that the sound of analog pedals can not be replaced by their digital counterparts."
Truly effective modelling of analog pedals, tube amps and guitar cabs has been around for years and is way more cost effective from the bedroom to touring bands.
The "purists" are hipsters who value the rarity of some pedals, massive pedalboards and their tube amps. I'm not knocking them - I understand why there is a nostalgia factor and tweaking dials is cool. As a computer guy though, I much prefer the ability to make things like this in my bedroom:
https://i.imgur.com/OqMoBxz.png
And when I want to tweak a dial, I program an expression foot controller to tweak any parameter (or multiple).
All that said, great to be looking at modelling techniques...
I am by no means a musician or an experienced one at that. I tinker and enjoy playing and learning. But I have limited experience overall.
My personal experience with electronic tools is the lack of feel. Can I make music with digital tools like AxeFX and similar? Absofreakinglutely. No doubt about it.
But those digital tools feel VERY different to me than the real thing. I'm not just talking about a speaker moving air, though that's certainly part of it. My tube amp simply responds differently than any digital model of a similar amp.
I find tools like the Kemper to be amazing, but they're just a snapshot of an amp in a particular configuration in a particular room.
From a technical standpoint, all this modeling stuff is super cool. But it doesn't feel the same at the end of the day and this is a personal opinion and preference on my part.
I look forward to the day that I can get an amp in a pedal (like the Strymon Iridium) and it behaves the same as the real amp. I think Fender's Deluxe Reverb (Tonemaster model) is as close as it has ever gotten, but it very specifically emulates a single amp and does so within a real amp cabinet rather than pushing it out to an audio interface.
Anyway, anything that gets people playing guitar is, in my opinion, a great thing. We live in a golden age of guitar equipment. I don't think it can honestly get much better than it is right now. It's an amazing time to be a guitar player and incredible options are available at amazing prices.
Sorry, bit late here. I generally play my amp on the edge of breakup. So the idea, for those not familiar, is that when the guitar is played softly you get clean tone, played with a bit more aggression and you get breakup, or distortion but not a ton of it. Think like a blues tone, where you get just a little bit of fuzz/grit.
The feeling of this is significantly different in almost every emulated/simulated/modeled amp than reality. They can be close, but the "feel" of it on the guitar side is often quite different.
Generally speaking, I feel like I have more control over the sound and how it plays with a real tube amp over a modeled amp.
Does that help? I approached this as if you aren't a guitarist, but if you are, sorry for the boring bits that you already probably know.
>Anyway, anything that gets people playing guitar is, in my opinion, a great thing. We live in a golden age of guitar equipment. I don't think it can honestly get much better than it is right now. It's an amazing time to be a guitar player and incredible options are available at amazing prices.
It sure is a great time for guitar equipment, as the digital revolution has made its way there too.
And it's arguably an opportunity cost for a kid to be pouring so much effort today learning the iconic (but tired) instrument of the boomer generation, when they could be breaking new musical ground instead, mastering Ableton's Push for instance. But to each their own, of course.
In the late 80s it sure seemed like the guitar was doomed. On one hand there was new wave with its synths and on the other were the guitar "gods" who wanked on with amazing technical precision making amazingly pedantic music. Then in the 90s the guitar and rock was reborn and suddenly cool again. It will come back and it won't be tired anymore. There are so many things that the guitar has barely hinted at in the past that will resurface as innovation. In the meantime, you can learn the guitar AND new tech. Besides, there will always be the draw of impressing a member of the opposite sex at a party by picking up a guitar. You just don't have that with "check out my latest drum programming, etc...
I'm a guitar noob, but have been wanting to pick up an electric for ages :).
Quick question - how does that Axe-FX compare to various Amp emulators such as AmpliTube, Line 6 Helix Native, Guitar Rig, Positive Grid BIAS Amp, S-Gear, etc... ?
As a guitar noob, I'd say that all of those options are great (and yes, I've used them all). If you were more than a noob and had specific needs, I might recommend a specific one to match those needs. I went with the AxeFx because it is insanely tweakable...
AxeFX is the most true to life, the Helix is quite a bit simpler to use, the Kemper has the best "feel" of every simulator. They achieve very similar results sound-wise, all of them can be used in record production no problem.
IMHO for the bedroom player the Helix is the best solution as it's good enough and significantly cheaper than the other options.
Axe-FX has good hardware, I've owned every unit since the original Axe-FX came out.
If you want to check this out for yourself, try a Line 6 Helix, and then the Helix Native VST with a normal soundcard. For me, there was orders of magnitude in difference. Good modelling boils down to having excellent hardware.
In a similar vein, I worked (eg: interned) at a few recording studios in my 20s. Most tracking in both was done to a 2" 24tk analog tape deck and the majority of post and mixing was all done digitally. I don't know what progress has been made in plug-ins in 20 years, I suspect a lot, but at that point there was nothing digitally that came close to the sound of electric guitars overdriven into the tape deck and saturating the tape to an extreme. Now I'm curious if anybody has gotten it right, but there are fewer and fewer studios with 2" tape decks to do a true A/B.
Effective modeling, yes, but not necessarily accurate modeling. The analog circuits are imperfect in many subtle ways, and component level simulation is rare (if it exists at all, I have not seen it). It’s all a bunch of high level approximations that don’t nail the feel to the point of beating blind tests.
It can be done. I don’t know why we aren’t there.
The audio world is halfway to to the alien truther community: the closer a rational outsider looks at it, the crazier they feel. Technically, it’s a trivial field. Yet here we are with snake oil saturation and subpar solutions.
This guy has been doing component level simulation from the beginning. I have one and it is accurate enough to convince some pretty big players to ditch their tube amps.
I think what you mean by 'hipsters' is 'professionals'. As someone who's made several records and been in many recording studios, I would challenge you to name a single record that does not utilize an analog signal chain, for mastering at the very least. VST modeling is great when you want a super clear tone and is very popular in certain genres. But definitely not ubiquitous and certainly not superior tech. Digital just don't SLAP like analog.
A lot of band stopped using analogue hardware for sound also because they tend to be way less reliable than their digital counter part. A lot of analog amp, pedals and synth will tend to change their sound due to the analogue hardware aging. Digital stay virtually the same. And the same can be said about weather condition. Change in temperature and humidity affect analogue hardware, not so much digital.
You will have the same sound from gig to gig and a lot of band really value this.
I guess you are technically right, but this part of the discussion is highly subjective. I was merely pointing out that the quoted statement was subjective and I wasn't using "hipster" as a pejorative - I was actually being somewhat sympathetic to their view.
Overall, my goal was to add to this discussion by pointing out the massive progress that has been made and also to show off my supercool signal path in the hopes that it would be inspirational to fellow geeks like me.
It would have gone over better with me (your average analog hipster) if you'd just mentioned the positive aspects of the thing you like. I'm eagerly awaiting the day when modeling is actually good enough for me; and your (common) attitude (that it is, obviously, and anyone who can't hear it is nostalgic/supersitious/hipster) is one of the reasons I don't give modelers a try more often.
It wasn't an ad hominem attack. Ad hominem quite literally refers to an attack against a specific person. Not only was he not attacking a person, but referring to 'hipsters' is not necessarily pejorative.
I believe he was incorrect to call guitar players who use analog equipment hipsters, as using analog equipment is the status quo, not some niche subculture outside of the mainstream.
I would like to respectfully suggest being a little less sensitive, though. Not giving new things a chance because of other people attitudes seems very silly to me.
Hipster is an insult; do people call themselves hipster? Not usually.
I've tried a lot of modelers. If all I ever hear is that I'm defective for not thinking they're perfect, why would I be open? It feels like a crusade with a side of propaganda. I really do want them to be good.
That is very cool. Though, part of the pedal are of course the knobs. You'd need to condition the wavenet on the knobs. Did that work well (I assume that you tried that already)?
Also, what is the inference latency on your model? A nice thing about analog guitar effects is that they are blazingly fast.
So this seems similar to an IR (impulse response) where you get a snapshot of an amp mic'd up in a room with knobs fixed at a particular position. In the end, you don't get knobs to fiddle with.
Awesome, I'd love to hear Josh from JHS Pedal's opinion on this.
Impulse responses can only represent linear time-invariant systems. Like delays, reverbs, equalization curves.
Distortion is non-linear, it is something like a max(-1, min(1, input)) function (a waveshaper, like you said), and it produces harmonics when applied to audio signals.
However guitar pedals also have some additional circuitry to "sweeten" the distortion, removing the extra harmonics added by the clipping diodes. Tubescreamers are notable for cutting bass and enhancing mids. An IR is able to capture this. This is important for guitar pedals, and the reason multiple of them exist.
If you capture the impulse response of an overdrive pedal you'll be capturing only the frequency response of a distorted impulse. If you process clean guitar trough this you'll simulate the frequency response but not the distortion itself, so it will just be a clean guitar with a tinny, shrill, sound, not an overdriven guitar sound.
One way around it (other than the idea in this article!) is doing multiple passes of Impulse Response capture with different amplitudes, this will capture this distortion non-linearity. This is supposedly how a Kemper Profiler works.
It has been said that if we achieve the ability to fully simulate the universe from initial conditions, the first application will be creating a perfect recreation of Marvin Gaye's Roland 808 drum machine in a 1982 performance.
Isn't this essentially just learning the case of learning one function, with set parameters?
I.e, if you want to build a complete model of the tubescreamer, you'd essentially have to train a model for each possible setting on the pedal - or in other words, every combination of the knobs.
Sounds like a real chore, if you were to actually do that physically - and in the end, don't you just want to learn the impulse response of the circuit?
I know some tools - like the Kemper modelling gear, are made for that exact purpose, and with extremely convincing results.
Not quite. As long as the knobs make consistent changes, just feed some large amount of tests and the model should generalize (smartly interpolate) the rest.
What I do have a problem with is that if the pedal is already implemented digitally, then all the human interpretability, along with the classic DSP machinery, is thrown out the window. A better approach would be to build the pedal via a differentiable programming language and then try to gradient descent toward some analog "can't get this juicy tube sound digitally" variant.
The knobs actually don't behave linearly on a tube screamer. Even the "tone" knob (EQ) doesn't behave at all linearly like you might expect out of consumer audio gear. Tube Screamers have an S-curve potentiometer in use for that knob.
That would be part of the problem with this approach.
Also with this approach you pretty much have to train the model with a near infinite collection of guitars in front of the model and a near infinite number of other effects turned on and off in front of the model.
Excellent writeup, I love seeing real engineering applied to guitar pedals rather than black magic tone chasing.
I'd be really curious to see if the model could be expressed as a transfer function and compared to the schematic for the pedal. The Tubescreamer is a fairly simple circuit but the mystery surrounding it indicates that there are some weird variables at play with the component properties that would lead to additional factors in the transfer function. Wonder if those variables could be identified somehow.
The "weird variables" may have to do with the various changes in manufacturing over the years. "Tube screamer" refers to at least 10 different units. Maxon, Ibanez, TS9, TS808, and zillions of clones.
A neat approach for sure. I am more interested in SPICE style modeled VSTs though. There's no need to throw ML at a simple math problem to get a bad approximation. I have not found many VSTs that seem like they're doing proper simulation of analog circuits. The VST space is filled with people claiming awesome results, but never revealing the sauce. If you're making a convincing sounding zener limiter, what are you actually doing? There are a dozen different levels of approximations you could make. Shouldn't a VST that is really simulating the analog circuit advertise that? On paper it should be easy, right? I've sat down with pen and paper to try to write out a simple input/output equation for a zener limiter circuit and I decided it was probably more worth my time to just plop a zener SPICE model into some language that could evaluate expressions and compile to VST (or use a systems of equations solver).
And then there's the real holy grail of analog simulation: the tube amplifier. I'm not sure SPICE models really capture the limiting behavior of tubes very well. You might need to implement the spec sheet in code. All fun sounding problems, and I'm not sure anyone has even done them yet.
I think that whereas most guitar effects are really very simple (gain and clipping, or delaying the signal and adding it back in), this approach will probably work just fine.
But, it is sort of using a sledgehammer where a tap from a spoon will do- the original tube screamer is just an op amp and a couple diodes, plus a bit of eq! Not much to it.
Plus, your real problems are going to be noise level (tube screamers in particular are noisy but a discrete transistor distortion can be made very very quiet). your a/d converter, your power requirements (comparable analog distortion effects use a few milliwatts) and cost.
Edit: But that said, this is a super cool project! Good job! Sorry I just realized that what I wrote was kind of negative.
AFAIK Kemper performs multiple passes of impulse-response capture, all at multiple signal levels in order to model non-linearities (like distortion). This is called dynamic convolution. [1] [2]
There are other ways to do that, like Volterra Series, used by Nebula plugins [3]
It would be interesting to see how this responds to dynamics. For example, a favorite guitar sound is a fuzz cranked, but with the guitar volume turned down. This results in a compressed dirty sound that can overdrive into distortion if you hit the strings harder (attack).
I play guitar and own a tube amp & a tube screamer.
All of this sounds horrible.. it doesn't even sound like his input is an actual guitar, it sounds like he's using a synth guitar sound or something. There's no dynamics, almost no sustain, no articulations. The outputs barely even sound distinguishable as a guitar through a tube screamer, even his actual tube screamer samples. (Possibly cause his interface is terrible?)
The conclusion is ridiculous given how simplistic everything is.
You can't use two tiny little clips to justify your model being high quality.
The true test has to even allow a bunch of guitarists to move all the knobs, plug the model into different amp & guitar combinations, put other effects in front of and behind it, etc..
The Tube screamer is called a Tube screamer because it's intended use case is to make the tubes in a tube amp "scream". Using it with all the knobs at noon is not consistent with this, it usually gets used with a tube amp that is already on the verge of distortion, and then you use the TS with the volume turned up a lot (3/4-max) and the gain quite low, this might be part of why this sounds so bad to me.
There are actually two different trains of thought on guitar effect modeling:
- Model it based on input & output waveforms like he's doing
- Actually model the circuit as an electrical simulation and then pass the signal through that.
I have personally found the second approach to be way more realistic and satisfying. The Yamaha THR amps work this way and they're really amazing.
One of the tricks here is a listener might not be able to tell a difference, but the guitar player picks up on a perceived change in how the guitar feels with these effects. A tube screamer has a lot of compression built into it for example. It causes everything to play to sound a little dirtier for the same amount of picking energy you put into the guitar. It will cause the player to play a little more lightly than they would without the effect. This is the kind of thing that makes a player reject the model and want to stick with the real thing, whereas the guy in the naive lab building the model thinks it's great cause they're not even playing an actual guitar through it. Once a skilled player tries it the "feel" is a dead giveaway which is which.
It's easy for some of this stuff to get lost on the electronics crowd if the background is electronic music. An actual acoustic piano is the only keyboard based instrument that has anywhere near the nuance that a guitar has, and a guitar still has way more weird stuff going on with dynamics and articulation. The range of inputs you have to feed into any kind of computer model to simulate guitar well is huge.
B-but a simple convolution would do the same.
Or for faster operation - a transfer function obtained using least squares method.
NN is kinda overkill for this, but it's cool POC anyways ;)
Trey Anastasio of Phish famously uses 2 stacked tube screamers. (And so do many of us phans). He deserves to be mentioned because more notes have hit audience ears through his screamers than anyone else's.
Also, the modern TS9 isn't exactly right. I'd love to see this work applied to vintage vs current TS vs modded units.
The author says it works in real-time, but to non music/audio folks this could mean '100 ms latency is real-time enough, right?'
Generally, I think the audio VST business is a really fun space to be in for a lifestyle business, as it is way too small to be attractive for VCs. It seems like a space that provides many niches for lots of small players to thrive in.
As an aside, it's really quite interesting that a lot of cutting edge tech is now used to emulate the hardware-based tech of yesteryear. Think film filters for photoshop, and about 90% of all audio plugins that emulate high end hardware, compressors, pedals, etc etc.