|
|
|
|
|
by est
4667 days ago
|
|
Because some of us are pissed that some BMP characters takes 3 bytes in UTF8, that's 50% more waste of storage space and 50% more time to read/write. I like the design of Python 3.3 encoding. ASCII takes 1 byte, BMP takes 2 bytes, everything else 4 bytes. http://www.python.org/dev/peps/pep-0393/ |
|
So far as I know, Haskell is the only other language that I know of that exposes, as the defaultish-native interface, Unicode strings as a sequence or iterable of code points (by just using UTF-32). Java, C#, your-language-here all do code units. C++'s template are powerful enough that someone could make unicode_str<encoding_to_store_as>, but I've not seen one.
See: http://www.unicode.org/glossary/#code_point http://www.unicode.org/glossary/#code_unit