Hacker News new | ask | show | jobs
by drewmate 3498 days ago
That's a really interesting proposal, but I'm afraid it would be difficult to implement in practice. If this third dimension were actually encoded into the number that represents each character, you'd end up with a lot of wasted bits (since most characters probably wouldn't even need the 3rd dimension, or at least as much of it as the heaviest users.) Another option would be to supplement the metadata that already accompanies Unicode characters (which block it is in, the name of the character/block, etc...) This could be done in practice now, but the information would almost certainly just be ignored if it needed to be looked up in a supplemental table. Furthermore, it's difficult to agree on just about anything in Unicode, and classifying all the characters based on concept seems like a Herculean task for a slow-moving body.

Any ideas for how to accomplish this in practice?

2 comments

I'll get to that as soon as I make email secure by design.
This already exists in Unicode, it's called "Variation Selectors" and they have their own block and are used to select emoji skin tones amongst other things.

But it would be wrong to use them in this case because an IPA G and the letter G are semantically different things and should not be unified into a single character just because they look similar.