Hacker News new | ask | show | jobs
by hmoodie 1220 days ago
I don't think this is true at all. Maybe I'm vastly underestimating how good these AIs are or soon will be.

I recently listened to the Vorkosigan saga, ~15 books all narrated by one person (Grover Gardner). Across these books there are probably 100 different voiced characters, and some of them show up in one book and then don't show back up until 5 or so books later. They each have their own voice and the voice is consistent throughout the whole series.

Is this something you think AI is capable of, or will soon be capable of?

3 comments

I don't think you're underestimating these AI systems

But I think you're overestimating the quality of the "average" audiobook and its narration

A sizeable swarth of books don't lend themselves to such finessed narration

All of The Expanse audiobooks were narrated by Jefferson Mays, except for the third book where they got some other guy to do it. Fans were so upset by this that they basically forced the company making to audiobooks to re-record the whole book a second time with Jefferson Mays.

I don’t think an AI is going to pull this off. Maybe with human help to tell it which characters should use which voices? Even then I think there’s a lot of “acting” in voice acting that an AI would fail miserably at. It might get 95% if it right, but I think the 5% would be off putting enough to make up for it.

I think in a case like that, it would be hard for AI to get it done right any time soon. But I think that case is kind of an outlier in the audiobooks world. Or potentially I just listen to a lot of non-fiction..
Potentially yes. https://valle-demo.github.io/

> VALL-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as an acoustic prompt.

Yeah this is good, but just imagine in 5-10 years how good these models will be. I think indistinguishable from human speech.