| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bick_nyers 1785 days ago
	The implications for the entertainment industry are massive. When I was working in indie game development, I wondered if you could use deepfakes as a voice actor. Basically get someone famous/good voice with infinite voice lines, without having to pay for studio time. Obviously, you would need them to sign-off on using their voice for commercial purposes.

3 comments

wishinghand 1785 days ago

There's a post on Hacker News for this, by a company called Sonantic.

link

chronogram 1785 days ago

How would you get the acting part of the voice acting right? I can’t imagine you wouldn’t still need a skilled voice actor for that.

link

M277 1785 days ago

Yeah, the acting part is a valid concern. A mod for The Witcher 3 does this to give the main character voiced dialogue[1], but it doesn't really sound.... right. I mean, it is voiced and some lines feel authentic, but some lines also just feel odd.

[1]: https://www.gamesradar.com/witcher-3-mod-uses-ai-to-create-n...

link

Cthulhu_ 1785 days ago

Not if you're going to voice the Elcor from Mass Effect.

"With barely contained terror. You drive a hard bargain."

link

post-it 1785 days ago

A markup language for voice that tells the generator how to inflect everything. It's not on the horizon yet, but anything that our voice can do, a computer will do someday.

link

mgdlbp 1785 days ago

https://docs.microsoft.com/en-us/azure/cognitive-services/sp...

https://cloud.google.com/text-to-speech/docs/ssml

https://docs.aws.amazon.com/polly/latest/dg/ssml.html

link

post-it 1783 days ago

Neat! Looks like I'm behind the times already.

link

AnIdiotOnTheNet 1785 days ago

Then the person doing the markup becomes the talent you have to pay to make things good.

link

jjk166 1784 days ago

It's a lot easier to write "cries like a baby" or "screams in terror" than it is to actually do it on command, over and over again, for take after take.

And one can even imagine a program with emotional slider bars that lets a person listen to how a line sounds with different levels of inflection and then automatically inserts the appropriate markup for the settings the user selects.

link

account42 1784 days ago

That percon can be replacable. Or it can be team. And you don't need to worry about the AI tiring or damaging their vocal cords after trying out different intonations all day. And eventually the there will be good enough automation to generate the intonattions too - either entirely or with minimal input from a voice director.

link

bick_nyers 1785 days ago

Yup exactly. Anything you would tell a voice actor to do you have in the markup. Obviously, the voice actor can still produce higher quality, probably for a long time to come.

link

Cthulhu_ 1785 days ago

I believe that's already out there, since services like Alexa will do certain inflections depending on the context of what they're saying. I think.

link

echelon 1785 days ago

I'm working on https://vo.codes

The new version is almost ready to launch.

I've also got voice to voice conversion working, and I'm trying to make it real time. It's pretty close.

link