Hacker News new | ask | show | jobs
by genevoronkov 4777 days ago
I mirrored this implementation a while ago since the full source isn't available. It was not nearly as successful as the blogger portrays. For example, if I used a high quality wav mono file to create a fingerprint it would have a hard time identifying a track that is an mp3. It seems the maximums actually get shifted and merged from compression. In other words there's a reason shazam uses entropy based anchor points to help it pick hashing values.
1 comments

I'm wondering if they bound the fingerprint search to human audible frequencies. MP3 compression, as a lossy codec, works by discarding information in the input signal that corresponds to inaudible frequencies. I believe this could be mirrored in the implementation by running the frequency domain peak-pick algorithm only over specific bin ranges.
I don't recall if the paper specifies the frequency ranges used but my implementation was bound to audible frequencies. I was going to use hill climbing search to find optimal frequency ranges but came to the conclusion my implementation was too flawed regardless. If I looked at the two graphs side by side(compressed vs uncompressed) they looked nothing alike. For example, the peak might be in the same region but it would be shifted.