Hacker News new | ask | show | jobs
by robeastham 3564 days ago
But does anyone know if it's possible to do TTS with the recently released libraries?

Thanks for the links, but to my ear the samples on those links don't hit the mark. The Wavenet samples in the original article cross the threshold for me. So I'd like to try some short length dialog tests, especially as I've read elsewhere that 1 second only takes 4 minutes on a K80.

Any light anyone else can shed on this would be great.

1 comments

Afaik none of the released libraries support the TTS experiment described in the paper. Deepmind used pre-computed linguistic features to guide the system in generating natural sounding speech, so your output will probably depend on the quality of those features. For the sake of not spreading misinformation; the 4 minutes was measured using a small model with a sampling rate of 4khz, this would not generate something sounding like the samples from Deepmind.
Thanks for the clarification and for spotting the 4khz error. This is fascinating stuff.

Looks like I'll have to concede that voice acting is much more practical, for now at least.