|
|
|
|
|
by oezi
481 days ago
|
|
Text-To-Speech models still aren't trained on rich enough data to have all the nuances we need to be fully expressive. For example, most models don't have a way to change accents separately from language (e.g. English with a slight French accent) or have an ability to set emotions such as excitement or sleepiness. We aren't even talking about adding laughing, singing/rap or beatboxing. |
|