| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by throwmenow_0140 3130 days ago
	Very cool stuff! It seems that all those solutions are based on the analysis of visual representations of spectrograms. Is this common or could you just use 2d arrays which encode the same information - would this be more performant? Nice blog post about this stuff: http://willdrevo.com/fingerprinting-and-audio-recognition-wi... - https://github.com/worldveil/dejavu

4 comments

doctoboggan 3130 days ago

I wrote up some of my experiments attempting to do what you are describing. I explain why you cant simply use a 2D array of an audiofile. You can find my post here:

http://jack.minardi.org/software/computational-synesthesia/

You can also see the code behind it here:

https://github.com/jminardi/audio_fingerprinting

I am by no means an expert in this area and a few people have since told me I did a few stupid things in my analysis. But you might find it interesting.

link

rahimnathwani 3130 days ago

In this context, what's the difference between 'visual representations of spectrograms' and '2d arrays which encode the same information'? Algorithms don't have eyes. The way they 'see' is by reading '2d arrays'.

link

dest 3130 days ago

You mean 2d arrays containing the raw audio signal? No, this would not work because you do not know the phase along the y dimension when you want to compare to another signal.

Another method to detect an audio pattern is cross correlation on the raw audio signal. But it is very expensive in computation power and memory.

The longest operation with fingerprinting is often the DB query that is associated. Lots of work to do there. In that space, Will Drevo's work is really good. I will share my DB implementation later.

link

throwmenow_0140 3130 days ago

I meant the spectrogram encoded as a 2d array, but I guess there isn't a big difference when the db query is the most expensive part.

I've always wondered: Is there a way to compare fingerprints with humming sounds or live recordings?

Those fingerprinting techniques don't seem to be suitable for those tasks, do you know of any methods to accomplish this?

link

dest 3130 days ago

You have special fingerprint algorithms that are suited for sound modifications like pitch https://biblio.ugent.be/publication/5754913 but it's not going to work with humming or live audio. I don't know if such a thing exists.

If you want to do some research, here is a short review paper on the topic http://www.cs.toronto.edu/~dross/ChandrasekharSharifiRoss_IS...

As for 2d array spectrogram, it is not needed in my lib (expect when plotting is activated). I only care about maxima in the spectrum of each data window. In other words, 1d spectra are enough.

link

ssalazar 3130 days ago

Spectrograms are a convenient way to visualize the data/algorithm but are rarely part of the actual analysis. They are already using the 2d array so to speak. In any case a spectrogram is just a 2d array where the magnitude of each array element is mapped to a color, so its effectively the same thing. Few if any people use visual representations of sound for analysis, except for the crazies who run spectrograms though visual deep learning networks.

link

dest 3130 days ago

Uh, are you sure of what you are writing here? Time-frequency analysis (including spectrograms) is one of the very fundamental tools for signal processing.

link

ssalazar 3130 days ago

True, i was thinking of a spectrogram as purely a visualization of a time-series of DFTs but Matlab and other tools do not make this distinction.

I was mainly responding to the OP's distinction between analyzing a visual representation and analyzing a "2d array" when they are basically the same thing.

link

throwmenow_0140 3130 days ago

> analyzing a visual representation and analyzing a "2d array" when they are basically the same thing.

This is what I mean. I guess their tooling just outputs graphics and it's easier to work with those than the pure 2d array in numpy or something similar.

link

jmmcd 3130 days ago

No, the graphics are only being used as part of the explanation. The algorithm is not working with them.

link