Hacker News new | ask | show | jobs
by rprenger 2979 days ago
Full disclosure, I currently work at Nvidia on speech synthesis.

You can definitely do this on a GPU. We use the older auto-regressive WaveNets (not Parallel Wavenet) for inference on GPUs, with the newly released nv-wavenet code. Here's a link to a blog post about it:

https://devblogs.nvidia.com/nv-wavenet-gpu-speech-synthesis

That code will generate audio samples at 48khz, or if you're worried about throughput, it'll do a batch of 320 parallel utterances at 16khz.