Hacker News new | ask | show | jobs
by abdljasser2 1569 days ago
Hello!

Immediate use case would be sampling. Say you like a certain sound in a song and would like to use it as a starting point for your own sound patch.

I also believe that transfer learning has benefits even for making great sounding instruments in cases where you have access to lots of data. That’s my intuition at least.

At the very least, it saves you a lot of memory/bandwith. Instead of having one large model per instrument you only need one large models with a few extra instrument specific weights.

7 comments

As a sample-user, it would be great to have this available in the toolbox.

Just reusing the original recording of a sample is equivalent to drawing a photorealistic tracing of an image: it represents a ground truth, but it's not illustrated in any particular artistic direction. And this makes the multisample libraries available today akin to "dry references" - they can be convincing as reproductions, some of the time, but you're stitching them together like a collage of photos.

If you throw the sample into a synthesis engine you can push around the parameters, crossfade it into a loop, add some envelopes, modulation and layers, and make it a uniquely stylized instrument, and this is one way to take the source material to a new place by forgoing some realism.

Doing the synthesis through style transfer helps move it in a different direction: it gets outside the bounds of directly sequencing performance parameters and makes the performance a little more like an effect, helping to glue the sound. And I think that could be really cool if applied to arbitrary source material.

Where does the training start to "fade off" in terms of "time spent" and "results achieved"? It seems 1 second vs 16 seconds have a dramatic change, but what about 50 seconds vs 3600 seconds (1 hour)?
I think this is a very interesting question which I currently don’t have the answer to. This is something we hope to answer in the upcoming paper.

    Say you like a certain sound in a song and would
    like to use it as a starting point for your own
    sound patch.
As a bonus, you might be the Lucky Winner of a copyright suit that eventually establishes a whole new area of case law.

Yeah, it would be ridiculous and unreasonable - but so was "I copyrighted these three notes in a row so pay me naow" :(

I thought this was an interesting point so I looked had a look to see if timbre is considered a copyrightable element of a work.

It seems it's usually not: https://blogs.law.gwu.edu/mcir/2018/12/20/timbre/

Very interesting! Thank you for this.
This reminds me of a task in my list that has been sitting there for nearly a decade:

  instrument FIR from song (justice - let there be light)
Here is the spectrogram of the sound I'm talking about:

https://imgur.com/kmtoMkd

It's pretty easy to filter out the drums since most of the energy is in other bands. Looking at the spectrum again I don't think a simple spectral replication will nail the sound right. It looks like there is some sort of beat phenomenon that isn't present at all center frequencies.

I dont think I understand what you mean, but, if I do, then you could look into using spleeter. It separates musical stems.

https://news.ycombinator.com/item?id=21431071

https://github.com/deezer/spleeter/wiki/2.-Getting-started#u...

The task I gave myself was to subtract out the drum beat (the song graciously gives the isolated loop before the instrument comes in), then mix/baseband the instrument to whatever frequency I wanted. If all went well I would make a complex FIR filter that I would pass tones into.

This model assumes the timbre is independent of the tone, but I can see now that this assumption is quite wrong and something more complicated (like this ML modeling) would be needed.

That synth is extremely distorted post summing of the voices. (That whole album has so much distortion, it's lovely).

So not only is timbre not independent of frequency, summing multiple notes is also non-linear. The "beating" this causes is most obvious on the second chord to play. This beating is not consistent as the notes change, it's based on the difference in frequencies between the two notes being played.

Maybe 2nd-order spectral relationships would get you a bit closer, e.g. bispectrum / bicepstrum
>Say you like a certain sound in a song and would like to use it as a starting point for your own sound patch.

Isn't that resynthesis? I mean, this could certainly be an addition to the vocabulary of resynthesis techniques, but is there something more categorical about it?

Differentiable resynthesis. You input a desired goal, it tries to achieve it, through (essentially) trial and error. Then the artist can focus on imagining goals, rather than fiddling with knobs :)
Just want to chime in and say that I would love to have this ability. Will you be adding more documentation on how a knowledgeable user could use your library to accomplish this? The docs are kinda sparse and I'm not sure how I could actually use it.
Rather than doing a melody, what happens when you train it with a bunch of single notes of the instrument for each note.

Then give it a melody and have it play the melody based on its library of notes for a given interment?

this is closer to what Redmatica (now Logic) AutoSampler used to do.... you would just play notes (separately by space), it would split by space, detect the root note for each sample, and then make a multisample out of it.

but actually trying to reproduce a real instrument involves legato between notes, handling pitch and timbre changes as you play, etc. So a much harder problem.

Thanks for that info. Familiar with Logic, didnt know that history though.

Thanks