|
|
|
|
|
by Charlotte_Buff
1384 days ago
|
|
Reversing a string is a useless operation in the real world. Its only application is padding out interview questions. “How to reverse a string” is also an incredibly vague question. What do you actually want me to do? Reverse code points, or code units, or grapheme clusters, or make it look like it’s written backwards? It doesn’t even make sense as a concept in most of the world’s writing systems. It’s like giving me a list of numbers and asking me to “combine” them. What does that mean? Do I sum them up, or concatenate them, or something else entirely? A lot of string reversal solutions are “incorrect” because there isn’t even a correct question in the first place. Even with an infinitely large code space, doing away with combining marks and encoding everything as precomposed would be impossible because you cannot have a definitive list of every single combination of letters and diacritics that may mean something to someone. If Unicode had been the first digital character set ever created, it would not contain a single precomposed code point because they are utterly impractical. As such, normalisation – or at least the canonical reordering part of it – is always going to be a necessity. |
|
I'm not sure why you focused on this one example, which was just meant to indicate the nature of the issue, not cite a broad concrete problem. There are plenty of situations where you'd want to operate on graphemes, not code points, like deleting the previous grapheme in a text editor. It would certainly help programmers write correct code if the two were the same.
>doing away with combining marks and encoding everything as precomposed would be impossible because you cannot have a definitive list of every single combination of letters and diacritics that may mean something to someone
It seems to me it would be trivial to enumerate these combinations, and assign code points to them. For example, the Germanic umlaut is only used with vowels, so that's at most 5 code points.