Hacker News new | ask | show | jobs
by yorwba 2299 days ago
Note that it won't work for Taiwanese (I assume Hokkien) unless you add the necessary support to espeak-ng.

If your lyrics are in Peh-oe-ji, you'll need to define how the romanization maps to phonemes. You may be able to get some inspiration for that from the definitions for Mandarin and Cantonese. Though I just looked at the "phonology" section on Wikipedia https://en.wikipedia.org/wiki/Taiwanese_Hokkien#Phonology and the tone sandhi rules look a lot more complex than any other Sinitic language I know.

If the lyrics use Chinese characters, there's the added difficulty of collecting a pronunciation dictionary, which I'd probably do by scraping https://twblg.dict.edu.tw/holodict_new/index.html , http://xiaoxue.iis.sinica.edu.tw/ccr/ and Wiktionary. (If you know any other sources for pronunciation data, I'm interested.)

1 comments

Yes, I know about romanisation! I wrote Pingtype, and extracted romanisation dictionaries for Taiwanese Hokkien and Hakka by parsing Bible data.

https://pingtype.github.io

Tones are difficult, so I encode those as colours. Adding code to espeak-ng sounds very difficult. Most of the songs are in Mandarin though, so I'll try those first.