Hacker News new | ask | show | jobs
by marcedwards 404 days ago
Did you listen to the example audio in the video? Soft synths and digital emulation can be absolutely amazing these days, but the VSM201 and Ultimate VoIS are in their own league. It’d be pretty easy to pick them out from a blind test with other vocoders.

Oh, it also might be of interest that the IVL algorithm isn’t FFT-based. I think their harmonizers sound better than the rest, so maybe FFT isn’t the best way to go.

2 comments

Yes exactly, I was really excited when I found out that you do not need a FFT to do speech processing.

If you look at the code of (phone/voice) codecs GSM/Speex/Opus you can see that you can estimate the spectral envelope (or the configuration of a physical tube model for the vocal tract) in time domain with linear prediction coefficients (LPC).

And it is simple, e.g. the often used Levinson-Durbin algorithm is just 22 lines of C code. It is an interesting exercise to build your own vocoder from scratch that fits in a single screen page.

Many of the code snippets I have seen (which likely have already processed your voice) are just translations of the Fortran code of the book "Linear Prediction of Speech" by Markel and Gray (1976).

Ah yes, ladder or lattice filters. If you don't mind old fashioned mailing lists there's still a few of hanging around in MUSIC-DSP@LISTS.COLUMBIA.EDU where code gets shared.
I thought it was phase synchronous overlap add, but I just checked and now I'm not so sure.

Has anyone got more details?

That gives me something to research! I’ve only scratched the surface of IVL’s algorithm, but intend to look into it further.