|
|
|
|
|
by pepa65
825 days ago
|
|
The issue I am aware of is with the Thai language that has zero-length unicode codepoints that get superimposed on the preceding non-zero-length unicode codepoint preceding it (or if none is present, an 'empty' non-zero-length placeholder). A non-zero-length unicode codepoint can have multiple zero-length unicode codepoints following it. (In Thai, no more than 2 for morphemically correct words.) For sorting, a normalization needs to happen in the order of these zero-length codepoints in order for unicode codepoint sorting to be correct. The standard practice in Thai is to have vowel signs before tone markers. In recent years, application support for this has greatly improved. |
|