| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by post-it 1785 days ago
	A markup language for voice that tells the generator how to inflect everything. It's not on the horizon yet, but anything that our voice can do, a computer will do someday.

4 comments

mgdlbp 1785 days ago

https://docs.microsoft.com/en-us/azure/cognitive-services/sp...

https://cloud.google.com/text-to-speech/docs/ssml

https://docs.aws.amazon.com/polly/latest/dg/ssml.html

link

post-it 1784 days ago

Neat! Looks like I'm behind the times already.

link

AnIdiotOnTheNet 1785 days ago

Then the person doing the markup becomes the talent you have to pay to make things good.

link

jjk166 1785 days ago

It's a lot easier to write "cries like a baby" or "screams in terror" than it is to actually do it on command, over and over again, for take after take.

And one can even imagine a program with emotional slider bars that lets a person listen to how a line sounds with different levels of inflection and then automatically inserts the appropriate markup for the settings the user selects.

link

account42 1785 days ago

That percon can be replacable. Or it can be team. And you don't need to worry about the AI tiring or damaging their vocal cords after trying out different intonations all day. And eventually the there will be good enough automation to generate the intonattions too - either entirely or with minimal input from a voice director.

link

bick_nyers 1785 days ago

Yup exactly. Anything you would tell a voice actor to do you have in the markup. Obviously, the voice actor can still produce higher quality, probably for a long time to come.

link

Cthulhu_ 1785 days ago

I believe that's already out there, since services like Alexa will do certain inflections depending on the context of what they're saying. I think.

link