I agree with the other folks here, Shazam is one of those things that still works just fine and I have no fucking clue on how they do it. What do they compare the audio recorded with?
They "fingerprint" the audio and then compare the fingerprint to a large corpus. The hard part is fingerprinting that is resilient to noise (and position within a track!) and a fast way to search the corpus of known fingerprints.