|
|
|
|
|
by viggity
291 days ago
|
|
I feel like this is a step in the right direction, but a lot of emotive text-to-speech models are only changing the duration and loudness of each word, the timing/pauses are better too. I would love to have a model that can make sense of things like stressing particular syllables or phonemes to make a point. |
|