Hacker News new | ask | show | jobs
by bradrn 1074 days ago
A short explanation as to how this works:

The voice can be modeled using two main components. The vocal chords are a periodic source of sound, which is then filtered by the mouth and tongue to produce vowel sounds [0]. The filter can be modeled as a set of band-pass filters, each of which let through a specific band of frequencies — these are called ‘formants’ in acoustic phonetics. Different vowel sounds are produced by combining formants at different pitches in a systematic way [1]. You can hear this yourself by very slowly moving your mouth from saying an ‘eeeee’ sound to an ‘ooooo’ sound: if you listen carefully, you can hear one formant changing pitch while the others stay the same. (I like [2] as an intro to this kind of stuff.)

The ‘voder’ works by having one key for each possible frequency band-pass filter. Pressing multiple keys adds the resulting sounds, producing an output sound with distinct formants. If you use the right formants, the resulting sound is very similar to that produced by a human mouth saying a specific vowel! Software such as the vowel editor in Praat [3] take it further, by allowing selection of formants from a standard vowel chart.

[0] Consonantal sounds are a bit more complicated, since they tend to involve various different noise sources and transient disturbances of the sound. For instance, /ʃ/ (the ‘sh’ sound) is noise of a lower frequency than /s/. I can’t work out how Harper produced the difference between those two sounds in the video — it seems to be impossible to do this with the live demo. In fact, any sort of pitch control seems to be impossible in the demo.

[1] This is how overtone singing and throat singing works! Selectively amplifying one formant gives the impression that you’re singing that note as the same time as the ‘base’ pitch. In fact, if you do that, your vocal cords are producing a pitch plus all its overtones, while your mouth is enhancing one overtone while filtering out all the others.

[2] https://newt.phys.unsw.edu.au/jw/voice.html

[3] https://www.fon.hum.uva.nl/praat/ — probably also available from your favourite Linux distro!

2 comments

There's also a very nice simulation, where you can play with the very different parts of vocal chords:

https://imaginary.github.io/pink-trombone/

I made a fork with few more features, it might even work on your phone browser:

https://jmiskovic.github.io/voicebox

Thank you for this. I had a lot of fun scaring my cat in bed and it inspired me to become a late middle aged opera savant.
I actually am a late middle-aged opera savant but sadly I have no cat to scare
Unfortunately this webapp (along with the original Pink Trombone) produces super glitchy audio and consumes 95% of CPU on my Chrome v114.0.5735.198 running on Ubuntu 22.04 (which is running on my Thinkpad X220)
That's rather strange. The graphics part is lightweight (pre-rendering the background and then drawing few shapes), but if you could shrink the browser to very small dimensions and test we could eliminate this one.

The audio part is bit more involved. The vocal tract is simulated in segments, each segment receiving, filtering and reflecting the soundwave energy. The algorithm is computationally heavy, but it ran well on my mediocre smartphone.

Maybe if stuttering is detected it could lower the number of tract segments, which also lowers the quality. Increasing the buffer size would probably also help with glitches but I don't think it would solve the high CPU utilization.

Apparently the Voder had a pitch pedal:

https://imgz.org/i9TzhzWu/

Ah, that would explain it. Thanks for finding that image!
No problem! It's from a video linked in a thread below (the extended World's Fair presentation).