|
|
|
|
|
by applejinn
3110 days ago
|
|
It's still a research project and not a production system: "We manually analyze the error modes of our system on the custom 100-sentence test set from Appendix E of [11]. Within the audio generated from those sentences, 0 contained repeated words, 6 contained mispronunciations, 1 contained skipped words, and 23 were subjectively decided to contain unnatural prosody, such as emphasis on the wrong syllables or words, or unnatural pitch. In one case, the longest sentence, end-point prediction failed." |
|
For an production GCP API, I think faster than real-time would be necessary.
For example, WaveNet took a year to go from research to production in Google Assistant: https://deepmind.com/blog/wavenet-launches-google-assistant/