| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rodw 1655 days ago

Automated music transcription - the process of generating a musical score from an audio recording - is a pretty active topic in signal processing (and has been for a couple of decades at least).

The monophonic case (one note at a time) is fairly well solved at this point: there are decades old solutions in both the frequency domain (like FFT) and time domain (like auto-correlation and the dozens of refinements of that basic concept) that work quite well under less than ideal conditions and in near real-time. Even naive solution like just counting the number of zero-crossings in the audio signal to estimate the fundamental frequency works pretty well.

The polyphonic case (like chord detection) is trickier, especially depending on what you're looking for exactly. I.e., is it sufficient to say "that's a C Major chord" or are you looking for a specific inversion or even fingering? Does it need to happen in real-time based off a microphone or could you batch-process an audio file instead?

But there are both academic solutions and consumer-oriented tools that can do a reasonable job of it (again, depending on what you're looking for).

If you're looking for guitar-chord detection in particular, I'd recommend you take a look at Chordify (https://chordify.net/). I'm even the developer of a product that competes with (or at least overlaps with) Chordify, but frankly it pretty much does what it says on the tin (extracts chords from audio recordings with more than acceptable fidelity, especially if you're willing and able to refine that by ear using the automated transcription as a starting point).

I'm pretty sure Chorify's solution is based on "deep learning" (ANN) techniques, but others have noted in this thread that's not the only viable way to do it. I suspect some combination of increasing computational power and algorithmic refinements will eventually lead to a "direct analysis" approach that becomes as common/conventional for polyphonic pitch detection as FFT and AC are for the monophonic case. There are already a number of fairly effective techniques depending on the constraints you want to put on the problem.

1 comments

cannam 1655 days ago

> The polyphonic case (like chord detection) is trickier

Very true, but for practical purposes chord detection is easier than polyphonic note transcription - it isn't necessary to transcribe all the notes with perfect fidelity to identify a likely chord, and there are many issues around note timing that become simpler when you assume one chord at a time.

> I'm pretty sure Chorify's solution is based on "deep learning" (ANN) techniques

At least at launch, I believe they were using a method more like that of Chordino - in fact using the same chromagram decomposition - but with a more sophisticated language model for chord transitions than Chordino's HMM.

(See this publication from one of Chordify's founders, et al, which I think is relevant, or at least interesting http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.294...)

link