| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by matisqe 1253 days ago
	(ElevenLabs dev here) The generative voices and the way they sound is very much a function all the training data, sampling and interpolation as you also pointed out. As a lot of these do involve deep breaths, that why synthesized voice will also have it present albeit at sometimes different times than human. Interpunction is the biggest influence on where those pauses will happen. From the users so far they found it actually enjoyable to listen to and that the breathing and pauses are accurate!

1 comments

logicallee 1253 days ago

I agree - the pauses in the first sample called "Narration" are incredibly accurate and pleasant to listen to.

As a developer, can you tell the difference between "Narration" and the human speaker? What can we listen for or what gives it away? For me I listened to the "Narration" clip many times and as a native British English speaker also confirms in another comment, it seems very difficult/impossible to tell the first clip is generated. Congratulations on such an achievement!

link