|
|
|
|
|
by indil
1377 days ago
|
|
>Reversing a string is a useless operation in the real world I'm not sure why you focused on this one example, which was just meant to indicate the nature of the issue, not cite a broad concrete problem. There are plenty of situations where you'd want to operate on graphemes, not code points, like deleting the previous grapheme in a text editor. It would certainly help programmers write correct code if the two were the same. >doing away with combining marks and encoding everything as precomposed would be impossible because you cannot have a definitive list of every single combination of letters and diacritics that may mean something to someone It seems to me it would be trivial to enumerate these combinations, and assign code points to them. For example, the Germanic umlaut is only used with vowels, so that's at most 5 code points. |
|
Well, 10 code points because vowels can be capitalized and 12 because ÿ is used in other languages.
That's one of the easiest cases. Now you need to go through _every_ other language which has _ever_ been used in human history and repeat that process for every combining character. Note also that in some languages it's valid to keep stacking a fair number of combining modifiers so you'd need to cover every permutation allowed in each of them, and spend a lot of time working with linguists and classicists to make sure you weren't removing obscure combinations which are actually needed.
At the end of years of work, you'd have an encoding which is easier for C programmers to think about but means all of your documents require substantially more storage than they used to.