Hacker News new | ask | show | jobs
by tkgally 1250 days ago
Your context-aware TTS is already sounding very good. If I were using it to produce a narration that other people would be listening to, I would want to make at most couple of minor adjustments every few sentences. Most of those adjustments would fall into a few categories: stronger or weaker stress on a particular word, rising or falling intonation on a phrase, longer or shorter pauses between words, and correction of the phonemes in a word. A half dozen toggles for those adjustments might be enough for most cases.

I wonder, though, how much training people would need to understand what adjustments need to be made. Experienced actors and narrators should have a good sense of what to fix, but many people might have trouble identifying what sounds strange in the initial TTS output and how it needs to be changed.