Hacker News new | ask | show | jobs
by eviks 343 days ago
But they don't have that explosion if you only encode the combinatoric primitives those characters are made of and then use composing rules?
1 comments

You still get the combinatoric explosion, but you have more bits to work with. Imagine if you could combine any 9 jamo into a single hangul syllable block. (The real combinatorics is more complicated, and I don't know if it's this bad.) Encoding just the 24 jamo and a a control character requires 25 codepoints. Giving each syllable block its own codepoint would require 24^9>2^32 codepoints.
> Giving each syllable block its own codepoint

That's the thing - you wouldn't do that! Only a small subset of frequently used combos would get it's own id, the rest would only be composable