Hacker News new | ask | show | jobs
by yurish 4001 days ago
I don't know much about Rust and Rust library, so I have a question: what if I what to develop Windows only software in Rust, will I need to convert back and forth between UTF-16 and UTF-8 (or whatever Rust uses in other parts of the library)?
3 comments

Yes.

The Rust std library had to pick a string encoding, and it picked UTF-8 (which is really the best Unicode encoding). The String type is platform neutral and always UTF-8.

However, it does provide an OsString type, which on windows is UTF-16. Maybe there is a library - and if not, one could be written - targeting Windows only, and implementing stronger UTF-16 string processing on the OsString type.

EDIT: To be clear, Rust's trait system makes this very easy to do. You just define all the methods you want OsString to have in a trait WindowsString, and implement it for OsString, even though OsString is a std library type. One of the great things about Rust is that its trivial to use the std library as shared "pivot" which various third party libraries extend according to your use case.

I believe Rust uses WTF-8 as an intermediate format for windowsy things (cheaper), but I'm not sure.
What is... oh... UTF-16, the gift that keeps on giving... this is, at the same time, utterly hilarious and horribly depressing:

https://simonsapin.github.io/wtf-8/

But there is actually prior art here - Java's contribution to perverse Unicode encodings is called "Modified UTF-8" and encodes every UTF-16 surrogate code unit separately.

http://docs.oracle.com/javase/6/docs/api/java/io/DataInput.h...

We have an http://doc.rust-lang.org/stable/std/ffi/struct.OsString.html to abstract over a native string in whatever encoding your platform has. Generally, things that interact with the OS use these, and they can convert to a UTF-8 String.
Suppose I'm on Linux, but I want to interact with Windows stuff. (CIFS protocol, NTFS on-disk format, disassembler for Windows executables, Wine-like program, cross-compiler, etc.)

I'll be wanting UTF-16 support. Going the other way matters too; if I'm on Windows I may need UCS-32 support.

Sure. That's not a problem. You can write any kind of string type you want, as a library, and convert between them. One of the nice things about Rust is that it's low-level enough that almost everything is a library anyway, so the language won't get in your way if you need SomeNicheString.
Since the full bullet point was “UTF-16 or UCS-2 support anywhere outside windows API compatibility routines” I'm assuming you'd get UTF-8 out of any high-level interface.