Hacker News new | ask | show | jobs
by 201984 33 days ago
Why wouldn't 8 be enough? Surely 18,446,744,070,000,001,024 characters is enough for every writing system in the world.
1 comments

Because that's not how Unicode works. It's not simply a table mapping numbers to all possible symbols. Unicode is full of special codepoints that have no meaning on their own, they serve as modifiers to other symbols and a single visible symbol can be formed by an arbitrary (in theory) long combimation of such codepoints. It doesn't matter how you encode it, it simply doesn't work as "codepoint -> symbol" and indexing in a unicode string is never O(1) and cannot be made O(1). Could we use a simple table approach? Maybe. But it wouldn't be Unicode
I actually wonder if the combinatoral explosion of attempting to enumerate every possible character combination would exceed 2^64 bits. My intuition is that it might, and also such a system would be unworkably unwieldy. The size of the spec document would also suffer from the combinatoral explosion. Imagine a system that tries to encode a unique entry for every possible Zalgo character.

Also, literally nobody wants to use 64 bit values to encode ASCII values. Even in our world of insanely large storage that would be breathtakingly wasteful.

Agreed, but it will take many generations for people to see characters in textual strings mainly as “code” instead of “data”.