Hacker News new | ask | show | jobs
by basve 3569 days ago
Afaik none of the released libraries support the TTS experiment described in the paper. Deepmind used pre-computed linguistic features to guide the system in generating natural sounding speech, so your output will probably depend on the quality of those features. For the sake of not spreading misinformation; the 4 minutes was measured using a small model with a sampling rate of 4khz, this would not generate something sounding like the samples from Deepmind.
1 comments

Thanks for the clarification and for spotting the 4khz error. This is fascinating stuff.

Looks like I'll have to concede that voice acting is much more practical, for now at least.