Hacker News new | ask | show | jobs
by oezi 481 days ago
Text-To-Speech models still aren't trained on rich enough data to have all the nuances we need to be fully expressive. For example, most models don't have a way to change accents separately from language (e.g. English with a slight French accent) or have an ability to set emotions such as excitement or sleepiness.

We aren't even talking about adding laughing, singing/rap or beatboxing.