Hacker News new | ask | show | jobs
by dhosek 1355 days ago
There are some interesting variations in different scripts thanks to how they were handled in pre-Unicode encodings. Perhaps the most interesting divergence is in the various scripts derived from the old Brahmi script. These are all abugidas (as are the Japanese kana) where vowels do not exist independently of consonants. But in Thai, for example, the syllable NA is written นา with น and า treated as separate characters, while in Devanagari, NA is written ना where न is the N sound and the A sound ा is a spacing mark which changes the shape and spacing of the first letter to give ना. Although a Thai reader will read the combination of consonant and vowel as a single entity, they are treated as two graphemes by Unicode, while the equivalent in Devanagari is a single grapheme (and it’s not simply because they’re printed connected since नाना will be connected but treated as two graphemes).

Perhaps most interesting in this respect is the comparison between the Devanagri ि and the Thai ใ which both appear before the consonant that they’re attached to, but in Thai the input will be ใ + ค to get ใค (so you input in the order of appearance rather than the order of pronunciation) while in Devanagari, the input would be क + ि to get कि (so you input in pronunciation order rather than graphic order).

2 comments

Japanese Kana is syllabaries, not abugidas.
> … in Thai, for example, the syllable NA is written นา with น and า treated as separate characters, while in Devanagari, NA is written ना where न is the N sound and the A sound ा is a spacing mark which changes the shape and spacing of the first letter to give ना.

Worth noting that «-a» is often implied and both, Thai and Devanagari (and in nearly all other Brahmi and Pali derived scripts), and is implicitely derived by the language speaker, and therefore is dropped from the spelling most of the time except for specific cases.

-a sound (ะ) isn't dropped in Thai in most case. It's only dropped in specific cases, mostly from words with Pali/Sanskrit origin.