|
|
|
|
|
by imron
3371 days ago
|
|
> So it's somehow C's fault that Unicode uses variable-length encoding Parent said string handling in C was elegant. My point is that it becomes fraught with (even more) issues once you throw non-English language at it. It is C's decision to handle strings in this way, and the decision of many C programmers to treat all strings as if they are just iterable character pointers. It's a recipe for bugs. |
|
I've heard (mostly here) that Swift does something different and treats glyphs as the basic unit. I haven't had a chance to look at precisely what that does. Given all the issues I've seen elsewhere I'm skeptical that someone, anyone can pull that off correctly.
UTF-8 at least has one elegance (there's that word again) in the design in that you can do some "dumb" ASCII things and if your code does not know what to do with fancy unicode, you can check the high bit of any given octet and "safely" skip over it and any adjacent nonascii sequence if you don't know what it means. This may or may not be applicable to a task at hand.