Hacker News new | ask | show | jobs
by bradrn 1333 days ago
> Spectrograms are analytical tools, they don't convey the nature of sound: whether it's consonant or dissonant, cool or warm, pleasing or annoying.

But they do! It’s entirely possible for even inexperienced phoneticians to reconstruct speech given only a spectrogram — and it isn’t even that hard to do so. I cannot make any firm statements about these ACF images, but given that they present no temporal information, I find it difficult to imagine this being possible with them.

And as for ‘conveying the nature of sound’, I invite you to consider e.g. [0] or [1]. It’s easy to see on the spectrogram that some sounds are noisy, some are resonant, some are strong, some are weak, and so on.

[0] https://home.cc.umanitoba.ca/~krussll/phonetics/acoustic/spe...

[1] https://home.cc.umanitoba.ca/~robh/howto.html

1 comments

The radial coordinate on ACF images is the temporal coordinate. Each circular slice encodes one FFT frame. Although I'm hardly a novice in making sense of spectrograms, I don't find them visually appealing: they are just schematic representation of sound to the eye. For example, here is my GPU implementation of wavelet transform, that works for arbitrary wavelet functions (Haar, Morlet, whatever you can code in a GLSL function):

http://soundshader.github.io/cwt