| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by applejinn 3110 days ago
	It's still a research project and not a production system: "We manually analyze the error modes of our system on the custom 100-sentence test set from Appendix E of [11]. Within the audio generated from those sentences, 0 contained repeated words, 6 contained mispronunciations, 1 contained skipped words, and 23 were subjectively decided to contain unnatural prosody, such as emphasis on the wrong syllables or words, or unnatural pitch. In one case, the longest sentence, end-point prediction failed."

1 comments

To add: > Also, our system cannot yet generate audio in realtime.

For an production GCP API, I think faster than real-time would be necessary.

For example, WaveNet took a year to go from research to production in Google Assistant: https://deepmind.com/blog/wavenet-launches-google-assistant/