|
|
|
|
|
by falsedan
3212 days ago
|
|
> Any benefit you get from using UTF-16 vanishes the moment you need to operate on it like a string, in other words. So, don't decode to a string, and do all your character manipulation on the bytes. > A better solution is to allow programmers to specify string encoding and default it to UTF-8. Absolutely not: the internal representation of a string should be of no interest to a user of your language. The 'best' solution is to represent strings as a list of index lookups into a palette, and to update the palette as new graphemes are seen. This is similar to the approach Perl6 is using[0]. [0]: https://6guts.wordpress.com/2015/12/05/getting-closer-to-chr... |
|
WHAT?!? I suppose that you've only ever worked with Latin characters. Please show a code example of changing European to African in this sentence in your language of choice, working on the bytes in any multi-byte encoding:
מהי מהירות האווירית של סנונית ארופאית ללא משא?
Yes, that is a Hebrew Monty Python quote. Now try it with a smiley somewhere in the string (HN filtered out my attempt to post the string with a smiley).
Is each application to maintain their own dictionary of code points? If the map is to be in a library, then why not have it in the language itself?