| > Unicode maybe should have been three dimensional, with "concept of G" in the 2D space, and "ways of representing G" behind G, along the third axis. All ways of representing G, whether little capital, capital, lower case, would or at least could equate to conceptual G in the 2D space. It brings up interesting, long-standing problems. Which of these count as the same letters? * Letters in two languages with the same appearance and making the same phonetic sound * Letters in two languages with the same appearance but making slightly different phonetic sounds. E.g., R in English and French * Letters in in two languages that are otherwise the same, but one has an accent. Is the accent part of the letter? Separate? Are they really the same letter? * Letters in two languages with the same appearance but making completely different phonetic sounds. * Similar (by any property) letters in two related languages; e.g., both Indo-European * Similar (by any property) letters in two unrelated languages; e.g., French and Vietnamese. * Letters with the same phonetic sound but different appearances. * Letters with the same appearance, one is phonetic and one an ideograph * Letters that are otherwise identical, but alphabetize differently in their respective languages * EDIT: Forgot a key one; Letters that are otherwise identical, but follow different rules of how they combine with the letters around them (a common issue, though not familiar to English speakers). * Letters that are in all ways identical but belong in different languages. In which languages code group does the letter belong? One? Both? What if the subset of Unicode supported by an application includes one language but not the other? etc. etc. |
It gets worse than this. Example: the letters Ä and Ö exist in both Swedish and German (as an example).
In German they are actually counted as the letters A and O with diaereses above them, and they alphabetize together with other instances of the letters A and O, because that's what they are.
In Swedish those are their own letters, which are completely separate from the letters A and O. They get their own place in the alphabet (second-to last and last, respectively), and replacing them with AE and OE is technically not acceptable in Swedish like it is in German (though it's often done anyway, e.g. on airline tickets).
And in Unicode they are represented by the same code-point even though in one language it is a letter, and in the other language it's only a variation on another letter. What a mess.