Hacker News new | ask | show | jobs
by ghaff 3111 days ago
You'd need access to the fingerprints database but otherwise should be fairly straightforward.

What's not straightforward is recognizing cover songs and the like. But that's not only non-trivial but AFAIK can't be done.

2 comments

> What's not straightforward is recognizing cover songs and the like. But that's not only non-trivial but AFAIK can't be done.

Well, you could translate the music into actual notes (or musical intervals), and use Smith-Waterman (or any more advanced and more recent technique) to find the song with the lowest edit-distance.

Converting digital audio to notes is both not as easy as it sounds and not how Shazam works.

https://www.toptal.com/algorithms/shazam-it-music-processing...

Yes, you can look at the frequency with the highest intensity in the FFT. This is the "dumb" version of converting music to notes (and is what I really intended to say but didn't choose to for sake of brevity).
The thing is that the process looks for spectral patterns, let's call it "harmonic content per unit of time," not just notes. Mere notes would result in lots and lots of false positives.
Let's just agree that the process is not too far removed from my initial brief description, and should be simple to implement, as the article shows. For any competent signal processing engineer, this should all be evident, which was the main point.

Also, even if you have many false positives, you have already narrowed down the search, and this allows you to do more brute-force searching like computing cross-correlations.

Where are you going to get 'the music'? There are millions and millions of hours of music out there, how are you going to gather and fingerprint it all?
Uhh isn’t creating the “fingerprint” the non-straightforward part? Keep in mind you could start listening at any point in the song as well.