|
|
|
|
|
by quickgist
493 days ago
|
|
For some reason, most of these (and other narration AIs) sound like someone reading off a teleprompter, rather than natural speaking voices. I'm not sure what exactly it is, but I'm left feeling like the speaker isn't really sure of what the next words are, and the stresses between the words are all over the place. It's like the emphasis over a sentence doesn't really match how humans sound. |
|
Speech encodes a gigantic amount of emotion via prosody and rhythm -- how the speaker is feeling, how they feel about each noun and verb, what they're trying to communicate with it.
If you try to reproduce all the normal speech prosody, it'll be all over the place and SoUnD bIzArRe and won't make any sense, and be incredibly distracting, because there's no coherent psychology behind it.
So "reading off a teleprompter" is really the best we can do for now -- not necessary affectless, but with a kind of "constant affect", that varies with grammatical structures and other language patterns, but no real human psychology.
It's a gigantic difference from text, which encodes vastly less information.
(And this is one of the reasons I don't see AI replacing actors for a looong time, not even voice actors. You can map a voice onto someone else's voice preserving their prosody, but you still need a skilled human being producing the prosody in the first place.)