Hacker News new | ask | show | jobs
by giveupitscrazy 1468 days ago
I definitely recognized the voice of real people alive today in your example set. I assume it's some kind of a trained ai you feed hours of content to for them to determine the speech pattern, question is, are you paying royalties to those individuals for their contributions?
1 comments

I'm not. I'll try to reach out and figure out a license of some sort. I suspect the royalties we could pay out is probably negligible. I think it might be safest for me to get rid of the "recognizable" voices and simply create new synthetic voices that are not of real people.
I'm pretty sure tech like this is similar to self-driving tech like 6-10 years ago in the sense that there are no laws addressing it. Like no one wrote a law saying "a driver must be in the driver's seat of a car" ahead of time. Youtube has already reinstated a Jay-Z audio deepfake that was originally taken down.

https://www.theverge.com/2020/4/28/21240488/jay-z-deepfakes-...

has more details

I agree, I think entirely synthetic voices will be the way a lot of services like this can operate in the future. Unfortunately I haven't seen much research in this area. Guess it's outside the typical "take a dataset and optimize the hell out of it" realm of a lot of ML research since the synthetic voice will not exist in any dataset ahead of time.

Been thinking a lot about how to accomplish this myself for a similar product I'm building, glad to hear someone else is thinking about it too!

Would it be possible to mix multiple people/voices in a training set? Or does that confuse the AI? Could be interesting to create a real but kind of not real model. If that makes sense…
Plenty of the internet lawyers came out to rabidly defend the right of Github to pirate data to feed into Copilot, so I wouldn't be that worried about IP. I would be more worried about picking the wrong voices, such as those with strong political connotations.
That's a great insight. I'll look into the Github/Copilot more. I can't code without Copilot anymore and that piques my interest. But yeah, we def need politically neutral voices.
on the flip side, I found the "professor" voice endearing. I'm not necessarily a fan (although not a hater either), but I thought it was:

A. Impressive display of capability

B. A very clever choice as it's recognizable but not as universal as say Obama's voice or Joe Rogan's would be

C. Brilliant marketing

I probably wouldn't offer it as a "real" voice for use in bulk through the API due to the legal concerns, but on the marketing page it's really cool and I would hang on to it. Plus if you get sued it would be great publicity :-)

Haha, I really appreciate this feedback. Everything I wanted to hear.

Frankly, trying to fight against the Goliath, with 0 marketing budget, and I’m desperately hoping to create noise. Breaking a rule or two is something AWS can’t do at their scale.

On the bulk offering, I do have a clear path forward. In short, I can create new synthetic voices. It’s like those this-person-does-not-exist images but for voice. “unreal speech”

This is a typical brute behavior. It's not about breaking a rule or two for the sake of success, it's about doing it while riding on people's backs. You think that stealing someone's voice is ok for a small startup company because you don't have the budget to know better while this is not the issue at hand. You obviously know this is wrong and are trying to get away with it riding on people's good intentions.

This is blunt and clear immoral behavior and you are fully aware of it. You just think that you have the right to reach success not by your own skills, but by stepping on other people, and until they complain you'll keep doing it.

PS: Even if Amazon does the same thing it's no excuse for you. Look up tu quoque fallacy.

The "Professor" voice is very clearly Jordan Peterson, which is likely what you're getting at.