Hacker News new | ask | show | jobs
by tatterdemalion 4001 days ago
Yes.

The Rust std library had to pick a string encoding, and it picked UTF-8 (which is really the best Unicode encoding). The String type is platform neutral and always UTF-8.

However, it does provide an OsString type, which on windows is UTF-16. Maybe there is a library - and if not, one could be written - targeting Windows only, and implementing stronger UTF-16 string processing on the OsString type.

EDIT: To be clear, Rust's trait system makes this very easy to do. You just define all the methods you want OsString to have in a trait WindowsString, and implement it for OsString, even though OsString is a std library type. One of the great things about Rust is that its trivial to use the std library as shared "pivot" which various third party libraries extend according to your use case.

1 comments

I believe Rust uses WTF-8 as an intermediate format for windowsy things (cheaper), but I'm not sure.
What is... oh... UTF-16, the gift that keeps on giving... this is, at the same time, utterly hilarious and horribly depressing:

https://simonsapin.github.io/wtf-8/

But there is actually prior art here - Java's contribution to perverse Unicode encodings is called "Modified UTF-8" and encodes every UTF-16 surrogate code unit separately.

http://docs.oracle.com/javase/6/docs/api/java/io/DataInput.h...