Hacker News new | ask | show | jobs
by yourapostasy 4145 days ago
For TTS, compare further with Vocalware and CereProc

Vocalware https://www.vocalware.com/index/demo CereProc https://www.cereproc.com/

It is getting increasingly difficult to pick one as the clear leader for "natural sounding". The results are good enough for voicing canned text, and certainly better enunciated than many thick-accented English speakers. Improvements through training can still be made in parsing the text.

For example, IBM Watson interprets "IT" as "it", in the following sentence.

Thank you for calling the IT department.

Vocalware and CereProc correctly parse that.

Who I would really like to hear opinions from are professional voice actors, though they would tend to be understandably leery to lend a hand to improve TTS. Is there a standardized form of writing text that communicates the kind of emphasis, placement of silence and warping of phonemes these actors use in their delivery to concisely convey emotion, that TTS products can adopt?

1 comments

SSML is a speech synthesis markup language that has some degree of popularity in the field. The specific section on markup for emphasis is http://www.w3.org/TR/speech-synthesis11/#S3.2