Hacker News new | ask | show | jobs
by thehappypm 1405 days ago
What is a signature? How is a signature computed from a noisy audio stream, over a mall speaker? How is a signature computed from an arbitrary starting point?
4 comments

IIRC, it's uses a Fast Fourier Transform of the time delay between high notes in the song to generate a series of "hashes" that are stored a db. Those ids can be calculated locally on the phone and then its a simple db lookup to retrieve potential hits. When Shazam adds a song to the db, they compute a series of "hashes" so you can identify at any point in the tune.
Wow, that's fascinating! I just ended up down the rabbit hole reading Avery Wang's "An Industrial-Strength Audio Search Algorithm" (linked in this thread) - it's such a cool way of "fingerprinting" pieces of music data.
My original comment was from memory of reading a post about how it worked a few years ago. Looking at what you read, I think the gist of what said is right, though it seems they use a different algorithm than FFT.

Totally agree though. It is something that opened my mind to thinking of a way to solve that problem in a way that actually works. Shazam definitely looked like magic the first time I saw it work.

TL;DR (from skimming thru the paper) he figured that a song's spectrogram looks like a starry sky, so matching a song is like finding a constellation on the sky. How do you do it efficiently? By searching for simple features of your constellation, such as pairs or triples of bright stars - those can be pre-hashed to find matches instantaneously. Once a possible match is found, you compare the rest of the constellation. Nothing breathtaking, in other words. However, among all the men who talked, he was the one who both talked and did, and that's his achievement.
Brilliant stuff is easy to understand, a lot harder to come up with. I could do that! (With a little help from wikipedia, audio processing libraries, the answer sheet, and the knowledge that it's possible in the first place)

To me, this highlights how hashing is the closest thing programmers have to magic.

Create a compound signature. You don't just take one measurement but many measurements and then assess the probabilities. You may have people talking in a mall, but they will be in a narrow frequency band. Similarly you can analyze the repeating elements. Keep iterating and adding stuff until f(signal) performs well
The closest to the ideal signature?