Hacker News new | ask | show | jobs
by kamathsutra 810 days ago
The way I think about realistic conversational speech is that if you get a phone call, you should not be able to tell whether it is an AI or a human just based on the voice. For English and even some asian languages like Chinese, this has already happened.

If you are a non-Hindi speaker and want to understand the difference, then I might find it difficult to explain :P But whatever you are learning, if you start practicing with a native speaker, I am sure you will easily surpass the SoTA hindi TTS models.

Non-conversational example: https://www.youtube.com/watch?v=ayYk3XkP0ts&t=22s&ab_channel...

You can list to this and understand easily that its AI generated speech. However, it works very well for dubbing etc.