Hacker News new | ask | show | jobs
by popalchemist 495 days ago
Alternatively, text that is input to these services should be passed through a normalization process, i.e. use LLAMA to convert kanji to hiragana or a romanization. The TTS output is then much better.
1 comments

Unfortunately, a simple normalization of kanji --> hiragana throws away pronunciation information.
You could just as easily use the LLM to convert the kanji into phonemes.
You can't lose word boundaries and phonemes don't tell you which part of the word is emphasized.
Modern TTS engines use tokenizers to convert words to phonemes. See: https://github.com/FunAudioLLM/CosyVoice/issues/202