|
|
|
|
|
by zorkw4rg
2810 days ago
|
|
I'm not so sure other languages do that any better (nodejs doesn't even support non-unicode filenames at all for instance). Modern python does a pretty good job at supporting unicode, very far away from being a "Mess" that's just very much not true at all. People always like to hate on python but then other languages supposedly designed by actually capable people do mess up other stuff all the time. Look at how the great Haskell represents strings for instance and what a clusterfuck[1] that is. [1] https://mmhaskell.com/blog/2017/5/15/untangling-haskells-str... |
|
1. it has proper, validated unicode strings (though the stdlib is not grapheme-aware so manipulating these strings is not ideal)
2. it has proper bytes, entirely separate from strings
3. it has "the OS layer is a giant pile of shit" OsString, because file paths might be random bag of bytes (UNIX) or random bags of 16-bit values (and possibly some other hare-brained scheme on other platforms but I don't believe rust supports other osstrings currently)
4. and it has nul-terminated bag o'bytes CString
For the latter two, conversion to a "proper" language string is explicitly known to be lossy, and the developer has to decide what to do in that case for their application.