Hacker News new | ask | show | jobs
by TheOtherHobbes 1133 days ago
The vo(co)der uses banks of fixed filters to apply the broad shape of a spectrum to an input signal. It's basically an automated graphic EQ. The level of each fixed band in the modulator is copied to the equivalent band in the carrier.

The bandpass filters have a steeper cutoff than usual and are flatter at the top of the passband than usual. And the centre frequencies aren't linearly spaced. But otherwise - it's just a fancy graphic EQ.

The formant approach uses dynamic filters. It's more like an automated parametric EQ. Each formant is modelled with a variable BPF with its own time-varying level, frequency, and possibly Q. You apply that to a simply buzzy waveform and get speech-like sounds out. If you vary the pitch of the buzz you can make the output "sing."

LPC uses a similar model but it applies data compression to estimate future changes for each formant band. So instead of having to control all the parameters at or near audio rate, you can drop the control rate right down and still get something that can be understood.

There are more modern systems. FOF and FOG use granular synthesis to create formant sounds directly. Controlling the frequency and envelope of the grains is equivalent to filtering a raw sound, but is more efficient.

FOF and FOG evolved into PSOLA which is basically real-time granulated formant synthesis and pitch shifting.