Hacker News new | ask | show | jobs
by oidar 649 days ago
The voice models for this are very good. I'd love to have granular control over the output of a model like this locally.
1 comments

Like SSML? See azure tts or google cloud tts, or ibm Watson or even old school system tts like SAPI voices on windows. But I hear you. In a VITS typical model system ssml isn’t standard. Piper tts does have it on the roadmap.
I just want programmable prosody. Prosodic controls would allow much more believable TTS - apple used to have it on the earlier TTS models, but these new TTS models sound so natural at the phoneme level, but the prosody is often jacked up so that it's easily identifiable as artificial.