Hacker News new | ask | show | jobs
by tmountain 1018 days ago
I was kind of hoping this was going to be human beings contributing read aloud versions of Gutenberg content. Since it’s not, I’ll propose a cool project. Raise money to enlist high quality voice actors to create audiobooks from Gutenberg. Release these audiobooks to the world for free. Which books come first could be voted upon. As someone who has used TTS a Lot in recent projects, I’m not excited about listening to AI read a book to me. It feels soulless.
2 comments

>As someone who has used TTS a Lot in recent projects, I’m not excited about listening to AI read a book to me. It feels soulless.

AI TTS still uncanny valley enough to distract. I prefer even more soulless traditional TTS which sounds "neutral" after habituation. To the point where my brain can start layering on characterization as if I was reading. AI TTS feels like listening to to medicore voice actor, where it's hard to overwrite their creative choices, so just left disapointmented and annoyed.

I agree completely! I kinda like the neutral tone of a soulless robot when it knows how to stay out of the way. Far better than a bad AI _or_ a poor human reading.
I have used TTS in the past and in the last few years there has been a quantum leap in TTS quality. A similar such leap in the next few years and it will dominate the audiobook scene for good or bad.
AI might dominate, but it would be a loss. Here’s a tutorial explaining modern audio fiction:

https://www.drabblecast.org/2018/07/30/inside-drabblecast-au...

(In audio format, of course; roughly 1.5 hours)

————

This episode takes you inside Drabblecast audio production. Ever wonder how we produce an episode of the Drabblecast? Wonder no more!

We dig into all the technical aspects like voice acting, sound editing and mixing, foley effects, music and more.

Preproduction? Reading? Acting? Yeah, it’s all here folks, all the blood sweat and tears that go into every production of the Drabblecast.

It might be worse than human narration, but at some point the economics becomes so loopsided that it's dominance is inevitable. One good thing I can see coming out of that will be an abundance of audiobooks of copyright expired books.
Are the economics actually better, or do they look better due to a lack of quality control? Because no TTS - even the most current AI ones - are perfect. They need corrections, which involves a human's time. And it's time that dictates prices, not skill (which largely reduces time).
The key is just which time is faster. If you are able to just listen to it once, and note a few errors, and slightly adjust, it may still may be much faster to use AI.
Based off Apple’s advertised times to produce AI audiobooks, the times are comparable. AI is not running quickly nor inexpensively for this task it seems.
The economics are only lopsided if the cost of producing the audio version is significant compared to the cost of writing the work of fiction.
Does anyone know of and TTS available now that doesn't completely muck up foreign words? I know you can make custom pronouncing dictionaries to use with some of the open source ones, but I wonder if any of the more modern systems are good for this. I have been listening to the english news podcast from a japanese news paper that is made with TTS and it gets its one job, pronouncing Japanese names and places completely jarringly wrong.