Hacker News new | ask | show | jobs
by PieSquared 3390 days ago
Thank you for clarifying this! We tried fairly hard to make this clear, because as you say, the hard part is generating inflection and duration that sounds natural. There's still a ton of work left to do in this duration – we're clearly nowhere near being able to generate human-level speech.

Our work is meant to make working with TTS easier to deep learning researchers by describing a complete and trainable system that can be trained completely from data, and demonstrate that the neural vocoder substitutes can actually be deployed to streaming production servers. Future work (both by us and hopefully other groups) will make further progress for inflection synthesis!

1 comments

My "Fake News" comment aside, I think what y'all are doing could be transformational for many reasons. Imagine a scenario where a person loses a loved one, and similar technology is able to allow them to "have conversations" with the deceased as a form of healing and closure. Not to mention, this could add a personal touch to assistant bots that will make them a pleasure to use.