Hacker News new | ask | show | jobs
by kevinmrose 1243 days ago
The technology does sound interesting, but I'm having a hard time seeing how this could be used for anything good.
4 comments

It raises awareness of the fact that such techniques are possible.

The generative AI results of the past year straddle the invention/discovery divide -- I've seen snarky internet takes recently along the lines of "why are tech bros solving art and poetry instead of our tedious labor", but what people not clued into the field don't get is that:

(a) This generative stuff is easier specifically because it doesn't have to interact with the real world. It's just data manipulation, there's a ton of it to learn from, and it's basically ok if a lady in a picture has six fingers or if the poem you generated has an incorrect meter on line 8. IRL interactions are harder to get data on, and nobody wants their bulldozer AI to hallucinate that the building next door is part of what needs to be demolished.

(b) These generative tools can exist precisely because there is regular statistical structure in (art/language/music). It takes some work to capture it, but it's the information's structure itself that enables, e.g., cloning a voice from a few seconds of recordings. Making research that exposes that fact taboo just means that actors who exploit it anyway will have a bigger advantage in abusing it.

It can also help to understand where a line of research comes from -- in VALL-E's case, it's an outgrowth of research into neural audio compression, an area with very clear advantages for information technology. If one starts shipping that technology and parts of it can be used for few-shot cloning of others' voices, it seems better that we're all aware of the fact.

Imo theres also a problem with solving "tedious labor" in that you will start displacing workers with automation at a rate the employment market cannot compensate for.
Personal avatar responding (with approval) in your own voice? Reading to kids in a familiar voice? Recording videos by you where the script copy is written by ChatGPT, the voice by VALL-E and the video by (?). I have to produce regular videos for training and other purposes, and this sounds ideal.
Hadn't thought of those use cases, but it does seem like it could be pretty helpful in those scenarios, especially when doing extended narration for training and tutorial videos. It is kind of worrying that phone scammers, hackers and other bad actors could leverage this technology for much more nefarious purposes, which was the thought that drove my initial reaction.
One person could voice an entire cast of characters in an animated production or video game.
Synthetic voice acting for deceased artists.