Hacker News new | ask | show | jobs
by jascii 1177 days ago
I'm having a hard time coming up with a non-nefarious use case for this.
7 comments

I'd get a kick out of having my own blog posts read to me in James Earl Jones's voice.

Or, heck, my own voice. Though it'd be surreal to hear not-me-but-me saying things I've never said.

even this is ethically questionable. james earl jones's voice is his livelihood.
While that is true, I'm not suggesting a pattern of behavior - just that it would be fun to hear.
We have been seeing some of these genuine use cases: youtube creators, audiobooks, elearning videos, podcasts, commercials, dubbing, and gaming.
BS. That could just be done without imitating someone's voice.
No-one is going to listen to an audiobook made with this. It's still fundamentally just TTS.
Have you tried it? I've listened to 2 generated audiobooks so far, has been great
I am toying about with building a virtual puppet software in the style of watchmeforever. I have a number of voices I do for the stage and DnD that I would be willing to train a few models on so I could give my puppets unique voices.
Anything written can be listened to with this tech. Any news article, any short story, a draft of a piece of writing you're working on. There is too much text for human beings to read it all.
> There is too much text for human beings to read it all.

so your logic is that all that text should be audio and people will consume more? Because I got news for you, reading is faster than listening.

When I said there's too much text for human beings to read it all I meant that it isn't feasible to pay people to read all text that someone might want to listen to into audio. Like a random blog written by someone in their spare time probably isn't going to hire a voice actor.

I think the case for having all text be listenable is pretty clear. We're all really busy and often our hands are busy but we're not doing something that mentally stimulating. This is an ideal time to listen to an audiobook, a blog, the news, or whatever else you'd like.

Oh yeah, how does reading work out for you while you're driving a car? smh...
And all AI bots are here to generate even more text. :( We will need to rethink and reevaluate lots of things that we are used to.
I'm using this kind of technology for temporary voice tracks in animated shorts.

I'd really like something like Img2Img for voices so I can translate a performance to an arbitrary (synthetic) voice.

Tortoise TTS can do this. You just pass it your example as a conditioning latent.
Generating audio for an audio book: If an author could speak for 20 minutes and then generate audio for an entire book from the book's text and the model, I think that would be very useful.
20 seconds*
The OP mentioned that for so called, "High-fidelity voice cloning", it would take 20 minutes of training. I think a book author would want the best quality possible to reproduce their voice.
Why reproduce their voice? There's no value-add there.
Many people prefer an audiobook version of a book to be read by the original author, which isn't always the case. If an author could make that version happen by using 20 minutes of their time + text2speech of the whole book, that would be an immensely positive value proposition on the side of this company.

But I'm not sure. Part of why I'd prefer the original author to read a book is that they vocally emphasize certain parts of the book, and I don't think these models could do that at this point.

> Many people prefer an audiobook version of a book to be read by the original author

Right, but having AI read the book in the author's voice is definitely not the author reading the work.

As you mention, the reason that people like to hear the author read it is because it's the author reading it, theoretically emphasizing and acting things out according to what was intended. It's not just to hear the author's voice.

So I don't see what the value-add is.

Voice generator tech has created some decent surreal memes (like audio recordings of Biden, Obama, and Trump playing video games together).

Outside of memes or maybe the occasional well-intentioned prank, I really can't think of anything either.

Massively reducing costs for Voice Over in Video Games. This should make it even feasible to create mods with audio which would be great :)
I would consider studios taking voice actors' voices and using them to generate new content beyond their contract to be abuse. I'm sure big corporations are rubbing their hands in anticipation, but I'm sure killing the VA industry will make the world just a tiny bit worse for everyone else.

Mods are more difficult to attach a moral judgement to. I don't think I'd really consider them malicious, as long as they're not sold, but there's a very thin line between a high quality mod and stealing someone's voice.

I think it will probably kill the current Business Model of the VA Industry. Having the ability to generate as much audio content as you like without the risk of the VA not being available anymore (dead, booked out,...) is just too good to pass up.

Instead we will probably see licenses for generated voices. And in case for games the game developer could make the voice model freely available for mods of his game.(The mods are already using assets from the game, why not also audio?)

Machine generated content cannot be copyrighted so I doubt companies will switch to AI generated voices for big games for that reason.

Voices can't be copyrighted either, so I don't see how a license for a generated voice would even work.

On the other hand, why shouldn't voice actors benefit from this tech?

I can easily imagine a future where AI-generated impersonations are deemed by courts or new legislation to be protected by personality rights. In that world, voice actors could expand their business by offering deeply discounted rates for AI-generated work.

Alternatively, if/when tech like Play.ht is consistently good enough, maybe it just becomes a standard practice for all voice acting work to include a combination of human- and AI-generated content, like a programmer using Copilot or a writer using GPT.

I'm sure programmers would love to expand their business opportunities by offering deeply discounted rates for creating AI-generated code.

No? Then why do you assume that someone else would want to do the same in their profession?

As AI-generated content is not protectable under IP law, it's a non-starter for games, film, TV, or music for anything except background filler.

Sure, why not? If you could earn more money and produce more value to society with the same amount of labor, and the legal/regulatory environment supported it, I wouldn't see a reason not to.

If you had a solo contracting business, and the technology existed to fully outsource a development project to AI based on carefully documented requirements, using it would be a cheaper alternative to subcontracting. Rather than writing every line of code by hand, you would transition to becoming an architect, project manager, code reviewer, and QA tester. Now you're one person with the resources and earning potential of an entire development shop.

I have my fair share of complaints about AI coding tools, but that isn't one of them. Maybe the increase in supply would result in a lower average software engineering income, but it wouldn't have to if demand kept pace with supply.

Furthermore, code is more fungible than a person's voice. If someone wants a particular celebrity's voice, that celebrity has a monopoly on it. Thus, it's not obvious that increasing the supply of one's voice acting work would decrease its value. (I suspect the opposite to be the case, until a point of diminishing returns.)

Although the voice acting case has a similar concern; will we get an explosion in new and/or higher-quality media, or will we see a consolidation to a smaller number of well known voice actors taking an outsized amount amount of work? Another issue, if we look beyond impersonation specifically, is that human voices may become marginalized over time in favor of entirely synthetic voices. I imagine that this would start with synthetic voices playing minor roles alongside human/human-impersonated voices, but over time certain synthetic voices would organically become recognizable in their own rights.

Again, I see plenty of concerns with AI in general, but more of a mixed bag than strictly negative, and there isn't anything inherently nefarious about this product in particular.

Personally, I'm optimistic about what society looks like in the long run if humanity proves to be a responsible steward of increasingly advanced AI. By the time we're at a point where 90% of people can be effectively automated out of a job, we'll have had to have figured out some alternative way of distributing resources among the population, i.e. a meaningful UBI backed by continued growth of our species' collective wealth and productivity. I can easily imagine a not-too-distant world that is effectively post-scarcity, where it's not frowned upon to spend years (or lifetimes) on non-income-generating pursuits, and where the only jobs performed by humans are entrepreneur, executive, politician, judge, general, teacher, and other things of that must be done by humans for one reason or another.

So am I happy that AI is encroaching on skilled labor? In the short term, not necessarily. But it's not necessarily bad either, it's the reality that we're in, and long-term I'm more optimistic than not.

Star Trek: Prodigy has already used audio from previous movies and TV to bring back to life several actors from previous series. It's not exactly the same as this, but their dialogue was taken out of context to create new scenes and story.
I know, and I almost wished they did use AI for that segment because it was pretty jarring (especially the TOS recordings).

There's still a huge difference between "reusing the work the studio paid for" and "recreating your voice forever after doing a single project".

I think “talking” with dead relatives or friends will become real pretty soon.

If people can find comfort hearing their mom say words of encouragement in a tough situation, I think a lot of people would do it. Kinda hard because for some others that would mean never getting closure.

Weird stuff is certainly about to happen…

The last thing on earth I'd want is for any aspect of my dead relatives to be reanimated through technology. No. That's absolutely fucking horrific to consider. I don't need a hallucinating AI pretending to be my dead wife. That's literally shambolic.

There is vastly more potential for that to be abused by others than used in any emotionally or socially constructive way.

I would also find that very creepy and it would probably keep you from moving on. I think there is a big difference between remembering what happened by looking at a photo or hearing an audio recording and having newly generated "content" from a deceased loved one.