Hacker News new | ask | show | jobs
by burntsushi 3688 days ago
> In Rust a string is a sequence of unicode scalar values.

The representation of a Rust String in memory is guaranteed valid UTF-8. To me, a "sequence of Unicode scalar values" is an abstract description, because it could be implemented via UTF-8, UTF-16 or UTF-32.

> I personally find it unfortunate that they dictate the storage of it at the API level

It is extraordinarily convenient and provides a very transparent way to analyze the performance of string operations.

For transcoding, there is the in-progress `encoding` crate: https://github.com/lifthrasiir/rust-encoding

I note that Go does things very similarly (`string` is conventionally UTF-8) and it works famously for them. They have a much more mature set of encoding libraries, but they work the same as the equivalent libraries would work in Rust: transcode to and from UTF-8 at the boundaries. See: https://godoc.org/golang.org/x/text