|
|
|
|
|
by ivanbakel
2066 days ago
|
|
As well as this being rather closed-minded, it's also not true. The contents of the 0000-FFFE codepoints are public knowledge, and the biggest users of space are: 1. the private use area
2. the general "CJK" area
The second of which has a truly mind-boggling number of characters, including every possible composite Hangul glyph used in modern Korean, despite them being constructable from the basic Hangul codepoints.Emojis and other symbols which aren't used for language appear relatively rarely. Certainly there is no reason to believe that UCS-2 would be sufficient for writing if they were removed. The number of scripts included in Unicode would exhaust even the private use area, and UTF-16 would have been invented regardless. |
|
Unicode strives for the round-trip compatibility with source character sets, and in this case KS X 1001 (KS C 5601 at that time) is a main culprit: it had 2,350 (out of 11,172) common syllables precomposed. But it happens that Korea had supplementary character sets beyond KS X 1001, which were subsequently added to Unicode 1.1 (up to some 6,000 characters), before it was decided that having an algorithmically derived section of all 11,172 syllables is better. This whole situation is now known as the "Hangul mess".