|
|
|
|
|
by carapace
3384 days ago
|
|
Unicode is a conflation of two ideas, one good and the other impossible. The good idea is to have a standard mapping from numbers to little pictures (glyphs, symbols, kanji, ideograms, cuneiform pokings in dried clay, scratches on a rock, whatever.) This is really all ASCII was. The impossible idea is to encode human languages into bits. This can't be done and will only continue to cause heartache in those who try. ASCII had English letters but wasn't an encoding for English, although you can and everyone did and does use it for that. |
|
Yes, the goal of encoding all human languages into bits is one that's near impossible. Unicode tries, and has broken half-solutions in many places. Lots of heartache everywhere.
This is completely irrelevant to the discussion here. The issue of code points not always mapping to graphemes is only an issue because programmers ignore it. It's a completely solved problem, theoretically speaking. It's necessary to be able to handle many scripts, but it's not something that "breaks" unicode.