To the author: Have you tried to use a logarithmic frequency scale in the spectrogram? [1] That representation is closer to the way humans perceive sound, and gives you finer resolution in the lower frequencies. [2] If you want to make your representation even closer to the human's perception, take a look at Google's CARFAC research. [3] Basically, they model the ear. I've prepared a Python utility for converting sound to Neural Activity Pattern (resembles a spectrogram when you plot it) here: https://github.com/iver56/carfac/tree/master/util
I don't think this problem is bound by absolute frequency resolution, the tightest distance between two notes on a typical piano is ~2hz and if you assume a doubling between octaves you're at <90 notes. The temporal changes and relative chord progressions probably give more info.
Thanks for your insights! I agree that log/mel spectrograms could be even more detailed and effective, and could be used with the SoX patch discussed here https://sourceforge.net/p/sox/feature-requests/176/.