|
|
|
|
|
by matisqe
1253 days ago
|
|
(ElevenLabs dev here) The generative voices and the way they sound is very much a function all the training data, sampling and interpolation as you also pointed out. As a lot of these do involve deep breaths, that why synthesized voice will also have it present albeit at sometimes different times than human. Interpunction is the biggest influence on where those pauses will happen. From the users so far they found it actually enjoyable to listen to and that the breathing and pauses are accurate! |
|
As a developer, can you tell the difference between "Narration" and the human speaker? What can we listen for or what gives it away? For me I listened to the "Narration" clip many times and as a native British English speaker also confirms in another comment, it seems very difficult/impossible to tell the first clip is generated. Congratulations on such an achievement!