|
|
|
|
|
by blincoln
1129 days ago
|
|
Thank you! Seems like that project was incredibly far ahead of its time. The physical-modelling aspect is super interesting. Does that mean that the similarity in sound to formant-based speech synthesis is because they're both using a sawtooth wave, noise, or other relatively simple sound as the raw input? I always imagined that a physical-modelling speech synthesizer fed by a sawtooth wave would sound more like a vocoder than Votrax or TI LPC output does, but I guess not. |
|
Essentially, yes. Both are known as "source-filter" models. A sawtooth, narrow pulse, or impulse wave is a good approximation glottal excitation for the source signal, though many articulatory speech models use a more specialized source model that's analytically derived from real waveforms produce by the glottis. The Lilencrantz-Fant Derivative Glottal Waveform model is the most common, but a few others exist.
In formant synthesis, the formant frequencies are known ahead of time and are explicitly added to the spectrum using some kind of peak filter. With waveguides, those formants are implicitly created based on the shape of the vocal tract (the vocal tract here is approximated as a series of cylindrical tubes with varying diameters).