| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by iverjo 3530 days ago

To the author: Have you tried to use a logarithmic frequency scale in the spectrogram? [1] That representation is closer to the way humans perceive sound, and gives you finer resolution in the lower frequencies. [2] If you want to make your representation even closer to the human's perception, take a look at Google's CARFAC research. [3] Basically, they model the ear. I've prepared a Python utility for converting sound to Neural Activity Pattern (resembles a spectrogram when you plot it) here: https://github.com/iver56/carfac/tree/master/util

[1] https://sourceforge.net/p/sox/feature-requests/176/

[2] https://en.wikipedia.org/wiki/Mel_scale

[3] http://research.google.com/pubs/pub37215.html

3 comments

stakecounter 3529 days ago

Mel scale spectrograms are the approach taken in a research paper which uses roughly the same technique as is described in this post: https://dl.dropboxusercontent.com/u/19706734/paper_pt.pdf

link

Terribledactyl 3529 days ago

I don't think this problem is bound by absolute frequency resolution, the tightest distance between two notes on a typical piano is ~2hz and if you assume a doubling between octaves you're at <90 notes. The temporal changes and relative chord progressions probably give more info.

link

Despoisj 3529 days ago

Thanks for your insights! I agree that log/mel spectrograms could be even more detailed and effective, and could be used with the SoX patch discussed here https://sourceforge.net/p/sox/feature-requests/176/.

link

Despoisj 3529 days ago

I didn't intend to go that far in the human genre recognition parallel, but thanks for the references ! Good job on the script too

link