Hacker News new | ask | show | jobs
by astrea 1659 days ago
Thus propagating the NN "If all you have is a hammer" trend. I don't quite know the nuance of music theory, but could you not get away with traditional Fourier analysis? You just need to decompose the song into its constituent frequency "bins" right?
4 comments

The trouble with analysing "notes" in a composition (or even just with a polyphonic instrument) is the pesky harmonics. The timbre of different instruments produce harmonics that happen to be the fundamental frequencies of other notes.
Would it be possible to eke out the harmonics by saying (for 'x' type of guitar tuned in 'y' way with 'z' effect, harmonics look like this). Like if the fundamental frequency is f, then the whole note looks like 0db at f and -5db at 2f and -10db at 3f or something. Then, when you're looking at the frequency domain, you start from the lower notes and say "hmm, looks like there's a fundamental at f, are the expected harmonics there"? If yes, that's the note, if no, it's something else.
It is absolutely possible (and effective, depending on what you consider to be an adequate ROI) to apply this sort of heuristic to this domain.

The "holy grail" of a universal AMT that works with any number of instruments of any type played concurrently isn't exactly an intractable problem to begin with, but if you constrain the problem in various ways (to specific instruments, to known tunings, etc.) you can definitely take advantage of a priori knowledge about the "timbre" of the instrument and the way in which the sound wave evolves over the duration of the note/notes to work-around what would otherwise be more ambiguous data. The octave/harmonics problem is one example of the kind of problem that is much easier to eliminate (relative to the abstract case) if you can make assumptions about the type of instrument that is generating the sound.

The overtones generated by the vibrations of a guitar string (for example) follow a fairly specific and distinctive pattern. If you dig a little bit into the physics/mechanics by which a given instrument generates sound there is a lot of tell-tale information to take advantage of.

This is basically how you'd do it with non-negative matrix factorization. You take the spectrogram of a bunch of known notes for that type of guitar, and store them in a template library (just a vector of vectors, where each inner vector is a spectrogram of that note). Then NNMF determines how much of each template contributes to some given signal. The templates are the "harmonics look like this" thing you're talking about. It works pretty well.

By the way, the reason it is easier to use NNMF than try to implement your suggestion as a heuristic is because there's much more overlap between the different notes than you might think, and (worst of all) the timbre of the note:

1. evolves over time!

2. depends on the velocity of the note (how hard the string was strum)

3. and the notes actually interact with each other. If you play an E, the A string will mildly reverberate too because of the shared harmonics

No, at the very least you also need to do the instrument separation, e.g. bass might match guitar chord root note at one moment, and do something entirely different half a second later, and of course in a coherent song all the different instruments will merge in the same frequency bins, either directly or with their overtones. Also, for certain instruments and effects on e.g. guitar amps the strongest peaks of Fourier transform for the instrument output may not necessarily match the notes that are actually being played.
As other commenters have expounded on, the short answer is "no". While I definitely agree that one must be careful about falling into the trap of thinking everything is a nail to be hammered with NNs, it's also pretty common to fall into the "this is easy, why don't you just..." trap for things that humans do with (relative) ease.
Well the NN has to have something to operate on, and I think a Fourier analysis may help as an input.

There is just too much going on with a track. Its really strange for us to say this mash up of sounds is "really" A-D-E at the heart of it, and when I play these chords it will suggest that wall of sound you heard on the record. The net is just capturing our biases.