I have used TTS in the past and in the last few years there has been a quantum leap in TTS quality. A similar such leap in the next few years and it will dominate the audiobook scene for good or bad.
It might be worse than human narration, but at some point the economics becomes so loopsided that it's dominance is inevitable. One good thing I can see coming out of that will be an abundance of audiobooks of copyright expired books.
Are the economics actually better, or do they look better due to a lack of quality control? Because no TTS - even the most current AI ones - are perfect. They need corrections, which involves a human's time. And it's time that dictates prices, not skill (which largely reduces time).
The key is just which time is faster. If you are able to just listen to it once, and note a few errors, and slightly adjust, it may still may be much faster to use AI.
Based off Apple’s advertised times to produce AI audiobooks, the times are comparable. AI is not running quickly nor inexpensively for this task it seems.
Does anyone know of and TTS available now that doesn't completely muck up foreign words? I know you can make custom pronouncing dictionaries to use with some of the open source ones, but I wonder if any of the more modern systems are good for this. I have been listening to the english news podcast from a japanese news paper that is made with TTS and it gets its one job, pronouncing Japanese names and places completely jarringly wrong.
https://www.drabblecast.org/2018/07/30/inside-drabblecast-au...
(In audio format, of course; roughly 1.5 hours)
————
This episode takes you inside Drabblecast audio production. Ever wonder how we produce an episode of the Drabblecast? Wonder no more!
We dig into all the technical aspects like voice acting, sound editing and mixing, foley effects, music and more.
Preproduction? Reading? Acting? Yeah, it’s all here folks, all the blood sweat and tears that go into every production of the Drabblecast.