Hacker News new | ask | show | jobs
by tialaramex 1465 days ago
A mixture of culture and technology.

Technologically, Rust's only built-in string type, &str, is a reference to a string slice - that is, you can't change it (the reference isn't mutable) and it is both a pointer to the start of some UTF-8 and the length of the UTF-8.

What encoding? Always UTF-8. Only UTF-8. Not "Well, it's kinda UTF-8 but..." it's just always UTF-8. This moves the burden to a single place, your text decoding code, to do things correctly, and great news - the entire world is moving to UTF-8, so you're on a downhill gradient where every week this works better without you lifting a finger.

That reference knowing the length is brilliant. Trimming whitespace off a string? You can just make another immutable reference to the smaller trimmed string. Zero copies. Slicing a URL up into components? You can do that too, zero copies. And yet it's all memory safe.

Now, Chromium is not some raw firmware for a $1 micro-controller, so it has library types like Rust's alloc::string::String (you can just name it "String" in normal Rust code but that is its full name) which, as its presence in alloc suggests, is an allocating String type, you can concatenate them, you can make them by formatting a bunch of other variables, the default ones are empty, the data goes on your heap and so on. But, String is AsRef<str> which means if what you've got is a String, and what you're doing is calling a function that wants &str Rust is OK with that and it costs nothing at runtime. Why? Because that &str is just two of the elements of the String type you had, the pointer into the heap and the length, it's easy.

Rust has lots of other types for stuff like Foreign Interfaces, like CStr and CString (for the C-style NUL-terminated array of bytes which might be text) but your pure Rust code shouldn't care about those, often it can say (unsafely) "Look, the C++ promises this is UTF-8, we'll take their word for it" or "I only need it to have bytes in it, let's make [u8] and we're done".

Culturally, Rust programmers write &str when that would do. There's a strong cultural pressure not to write String when you really mean &str, and the compiler won't let you write &str if you needed String. So this results in less thunking of the sort complained about in C++