Hacker News new | ask | show | jobs
by crazygringo 1334 days ago
These are beautiful artistically...

...but I don't have the slightest clue what they mean, and I've certainly dabbled in FFT and spectrogram and wavelet work, on top of a lot IPA vowel work, but I'm missing the why behind the formulas given and I'm missing how these plots are supposed to relate to frequencies visually.

A spectrogram of someone pronouncing vowels is extremely straightforward. Recognizing patterns of formants in spectrograms is quite simple.

So what is this trying to reveal that spectrograms don't? Besides that, what are the axes? Why are these circular or presumably polar? Why are they spiky? Why the particular blue/red bandpass filter? And what does autocorrelation have to do with vowels?

I'm not sure I've ever found myself so mystified by something I feel like I should have the background to understand quite easily.

If they're just supposed to be works of art then that's cool. But the title "visual morphology of vowels" seems like the plots are intended to reveal some kind of link between frequencies and the shape of the mouth maybe? But the example images aren't even labeled by which vowel they represent so I'm just baffled.

1 comments

Spectrograms are analytical tools, they don't convey the nature of sound: whether it's consonant or dissonant, cool or warm, pleasing or annoying. We could, and do, analyze pictures with 2D spectrograms, but hardly anyone would argue that those spectrograms are true representations of pictures. And that's the question I've been trying to answer: if spectrograms and waveforms aren't the true images of sound, then what is?

On these ACF images, consonant frequencies produce regular patterns, that appear good due to their regular structure. High and low frequencies map to different colors, that appear to arrange themselves in a certain good looking way - this effect is surprising to me. The interesting observation here is that the good looking arrangements happen only for pleasing sounds. Different vowels, 29 total, taken from the Wikipedia's IPA table, produce different and distinct shapes - that's what I meant by "visual morphology".

The ACF data can be presented in any form, it's just data after all, but I'm not interested in just information, I want the image to convey the "harmonic nature" of sound, and the polar coordinates happen to do this well.

There is a link to demo there, and you can generate ACF images for any sounds you have, just make sure they are isolated 1-2 sec recordings. After looking at the images and listening to sounds that correspond to them, you'll quickly notice some pattern and will be able to guess the sound by looking at its image.

> Spectrograms are analytical tools, they don't convey the nature of sound: whether it's consonant or dissonant, cool or warm, pleasing or annoying.

But they do! It’s entirely possible for even inexperienced phoneticians to reconstruct speech given only a spectrogram — and it isn’t even that hard to do so. I cannot make any firm statements about these ACF images, but given that they present no temporal information, I find it difficult to imagine this being possible with them.

And as for ‘conveying the nature of sound’, I invite you to consider e.g. [0] or [1]. It’s easy to see on the spectrogram that some sounds are noisy, some are resonant, some are strong, some are weak, and so on.

[0] https://home.cc.umanitoba.ca/~krussll/phonetics/acoustic/spe...

[1] https://home.cc.umanitoba.ca/~robh/howto.html

The radial coordinate on ACF images is the temporal coordinate. Each circular slice encodes one FFT frame. Although I'm hardly a novice in making sense of spectrograms, I don't find them visually appealing: they are just schematic representation of sound to the eye. For example, here is my GPU implementation of wavelet transform, that works for arbitrary wavelet functions (Haar, Morlet, whatever you can code in a GLSL function):

http://soundshader.github.io/cwt