Hacker News new | ask | show | jobs
by echelon 1289 days ago
Griffin-Lim is slow and is almost certainly not being used.

A neural vocoder such as Hifi-Gan [1] can convert spectra to audio - not just for voices. Spectral inversion works well for any audio domain signal. It's faster and produces much higher quality results.

[1] https://github.com/jik876/hifi-gan

1 comments

If you check their about page they do say they're using Griffin-Lim.

It's definitely a useful approach as an early stage in a project since Griffin-Lim is so easy to implement. But I agree that these days there are other techniques that are as fast or faster and produce higher quality audio. They're just a lot more complicated to run than Griffin-Lim.

Author here: Indeed we are using Griffin-Lim. Would be exciting to swap it out with something faster and better though. In the real-time app we are running the conversion from spectrogram to audio on the GPU as well because it is a nontrivial part of the time it takes to generate a new audio clip. Any speed up there is helpful.