|
|
|
|
|
by kamathsutra
810 days ago
|
|
The way I think about realistic conversational speech is that if you get a phone call, you should not be able to tell whether it is an AI or a human just based on the voice. For English and even some asian languages like Chinese, this has already happened. If you are a non-Hindi speaker and want to understand the difference, then I might find it difficult to explain :P But whatever you are learning, if you start practicing with a native speaker, I am sure you will easily surpass the SoTA hindi TTS models. Non-conversational example: https://www.youtube.com/watch?v=ayYk3XkP0ts&t=22s&ab_channel... You can list to this and understand easily that its AI generated speech. However, it works very well for dubbing etc. |
|