Hacker News new | ask | show | jobs
by revenga99 857 days ago
Wow. I could see this as threatening audio book narrators. However I would still prefer a real narrator to this in its current state. I think what it might be missing is different voices/accents for different characters.
4 comments

Folks probably will think me silly for this, but I prefer TTS. I have access to voice actor audiobooks but I pick the .epub files instead. I made a little extension to inject window.speechSynthesis with "Microsoft Steffan Online (Natural) - English (United States)" at rate=6 when I hit a hotkey. At high speed it's much clearer and natural sounding than a sped up voice actor recording.
I also prefer TTS. The spin voice actors put on the text always distracts me. With text to speech I only get what's in the text itself.

I wrote a Perl/Tk GUI script for my file manager to manage text to speech through Festival 1.96 w/voice_nitech_us_awb_arctic_hts. Unlike neural network AI models it runs fine even on very slow machines.

I think Google's product has that: https://play.google.com/books/publish/autonarrated/
That sounds pretty bad though
As an avid consumer of audio books (150+/year) - we are well past the point where narrators are necessary. Professional audio books take too long to release, are too expensive, are concentrated on a limited number of platforms and just aren't THAT much better than the automated stuff for the long tail of books.
Audible doesn't allow AI narration or much Public Domain stuff at the moment. The only thing keeping it from happening is the markets trying to keep back a flood of crap from over taking / drowning / diluting the more well crafted options and causing the consumers to get really annoyed.
Let's be honest, the moment Amazon thinks their tts is good enough, they'll be offering AI audible deals to every author on their platform
The 80% solution: Pair with a professional narrator who has consented to have their voice modeled by this (see the note at the bottom about what they held back from open sourcing). This generates a beta, and then you can pay the human narrator to rework specific sections you’re unhappy with.
Yea, hard to say because the obvious implementation would be to just have it built into phones once the model is potentially portable enough - I see this happening quicker as a more general TTS functionality much like Google is doing with 'subtitles anywhere' aka Live Caption. Paired with translations we maybe pretty close to the universal translator type functionality. I could see end users being able to customize their voice assistant even more or maybe having multiple based on if its talking for you or to you.

Anyways the problem with this is it makes the product 'ai audiobook' basically worthless, why not just buy the eBook and have my personalized translator turn it into an audio book. Now you just have market differentiation between cheap ebook + ai narrator vs expensive + professional narration.

Though narration costs are already pretty cheap - it really does not factor into the cost of publishing an audio book that much unless its really a bottom of the barrel book.

Thinking about this more - the copyright implications become much more interesting once its no longer a recording. Does it could as a private performance if you have headphones on? Is it a public performance if you listen to live TTS through your speakers in public?
I'm looking forward to my on device TTS, but Amazon has a decent moat with the DRM on their Kindles.

At least they'll have to remain somewhat competitive once consumers decide they want the AI audiobooks and the like.