Hacker News new | ask | show | jobs
by vlovich123 1464 days ago
> In C++ "dropping" ownership as you describe is no big deal, the C++ design doesn't care, but in Rust if you actually drop(foo) it's gone

I think you’ve built a straw man of my argument and then argued with that.

Clearly I meant that it seems possible that a sufficient complicated call stack could still be set up to jump between needing the owned String type and the borrowed &str type. That’s what I meant by dropping ownership as that’s what’s happening in the c++ code when you go between char*/string (the API is dropping its need for ownership). The argument of “ If you've got the owned String, and I needed an owned String, I should ask for your owned String, and we're done” is weak because that same argument would apply to C++ code and yet the code still ended up that way when you pasted together components in a very large code base. Now maybe it’s a bit simpler because you have string, string&, const string&, and const char* and doing that antipattern that happened in C++ just wouldn’t be ergonomic in Rust. Maybe. But that feels like a very thin argument and not “this is impossible in Rust”.

1 comments

I am definitely not arguing that it's impossible but my experiences with Rust lead me to think you've significantly underestimated how important those ergonomics are.

The "I should ask for your owned String" argument does not apply equally well in C++ because of a crucial design infelicity in C++. Your caller may well not have an owned std::string.

In C++ raw char pointers are totally a thing. Because std::string is a late addition (if you learned C++ in the early 1990s a "string" class was maybe an interesting exercise, not a library type) the string literals aren't a built-in string type, and much of the API isn't shaped for such a type either.

Now, the effect is those are (sometimes) owning pointers, it is possible I own some C++ string in this sense, and all I have is a pointer into it. If I give you that pointer, it's not because I didn't give you the owned string, that pointer is my owned string. You want a std::string and there's no reason I would have one at all.

You can mutate these strings, but of course you can't extend them because you've got no way to know how to communicate with the allocator, maybe they live on the stack, or in a private heap. At the time this seemed like a good idea, today we don't think so.

How is this solved in Rust though? The only thing Rust solves is that you don't accidentally hang onto a bad reference, but I'm failing to see how &str/String is meaningfully different from char/string since the same issue applies (char = borrowed, string = owned). My Rust is rusty so apologies for the pun & any syntax errors. Let's say you have the following:

    // some process managed by team a
    fn caller1(...) {
       String s;
       level1Callee(&s);
    }

    // some process managed by team b
    fn caller2(s: &str) {
       level1Callee(s);
    }

    // library 1 by team c
    fn level1Callee(iDontNeedOwnershipOrDoI: &str) {
       level2Callee(iDontNeedOwnershipOrDoI.to_owned())
    }

    // library 1 by team d
    fn level2Callee(iNeedStrongOwnershipBecauseIMutateTheStringAnyway: String) {
       level3Callee(&iNeedStrongOwnershipBecauseIMutateTheStringAnyway)
    }
This is roughly what happened in Chrome as I understand it (except multiple times because of independent libraries that didn't notice that they probably should have just made a copy to begin with). Let's pretend the codebase had been written originally in Rust. How does Rust avoid this problem from coming up? This didn't happen in Chrome because of ownership. It came up organically because of years of refactoring obfuscated things. For example, level2Callee started out not needing strong ownership but then started calling a library that did (refactoring a complex codebase is very hard & time consuming). Rinse & repeat after many years. Now maybe Rust tooling is better able to point out the unnecessary acquiring/dropping of the strings but that seems unlikely - the problem is statically very difficult to lint around.
Chromium had a few problems, they're public so we can go read the changes made and look at the context.

In a lot of places in Chromium the impedance choices are arbitrary. You have a raw pointer but need a string or vice versa. Ownership isn't the problem, so in Rust you literally just always choose &str for these APIs and pay nothing. A team who design their API taking &String in this situation get the same treatment as a team who name all their types Data1, Data2, Data3 and so on. Somebody senior fetches the water spray, "No. Bad programmer".

You might be outraged, surely C++ programmers also never get this wrong. But nope, happens all the time as Chromium illustrates.

Added: One cause that shows up in my review is this:

C++ strings know how long they are. The raw pointer does not. As a result if we have lots of people asking if their thing is "some text" it's tempting to demand they give us a string, since if the string's length isn't 9 we don't need to look at the text itself. It's an optimisation! Rust's &str knows how long it is.

Again I’m failing to see the distinction. Why did that team need a string? Presumably it’s mutating right? &mut str would let you mutate the existing characters (similar to char) but it doesn’t give you permission to resize (since doing so obviously might involve a reallocation and change underlying pointers referenced elsewhere).

> A team who design their API taking &String in this situation get the same treatment as a team who name all their types Data1, Data2, Data3 and so on. Somebody senior fetches the water spray, "No. Bad programmer". You might be outraged, surely C++ programmers also never get this wrong. But nope, happens all the time as Chromium illustrates.

So the thrust of your claim is that Rust programmers are better. The more generous interpretation I’ll read here is that Rust has stronger conventions here. Even still. I don’t see it. The problems are still the same. I think there’s a hyper focusing on ownership when it has nothing to do here. In c++ Const Char (and now string_view) == &str. Chrome developers were going from &str to String and back many times. If Rust had some way to convert &mut str to String without a copy if none was needed, then I think that might apply.

>Again I’m failing to see the distinction. Why did that team need a string? Presumably it’s mutating right?

As a Rust programmer this is what I'd expect, because that's what alloc::string::String is for but this is not how std::string is used in C++ & especially not how it was used at that time

So I spent some more time reading. Beyond signifying ownership std::string has other properties that the raw pointer char * does not have which influence the C++ programmer and especially the enthusiastic but perhaps less experienced C++ programmer

1. It's a real C++ type whereas char * is left over from C

2. Unlike char * the std::string remembers the length of the string which also speeds up equality comparison (we know "classification" != "class" from the length before we even look at the text data)

3. std::string has a "Small String Optimisation" which will be emphasised repeatedly to you by C++ gurus. This means small local strings don't need heap space which is good. So... you should use std::string?

Now. If we compare &str and alloc::string::String:

1. Both Rust types, not left over from some prior language

2. Both know how long they are

3. Neither has "Small string optimisation". You can do this trick (oh boy and how) in Rust, but Rust's standard library intentionally does not provide it and nobody's public APIs expose such a thing.

> &mut str would let you mutate the existing characters (similar to char) but it doesn’t give you permission to resize (since doing so obviously might involve a reallocation and change underlying pointers referenced elsewhere).

Yes it would, and this exists but I've rarely seen it put to any use.

> The more generous interpretation I’ll read here is that Rust has stronger conventions here.

Ultimately yes, the conventions are stronger, a cultural difference. You can go look for yourself, at both the sprawling vastness of public Rust library and Chromium's own APIs in that era which often take string despite having no interest in ownership.

Chromium has a map (of configuration parameters) inside it, in which the keys are std::string. If you understand this as an owning object for mutation that sounds insane but if you just think it's a convenient object that knows how long the text is and keeps shorter text out of the heap it's awesome. Right?

Of course, you can't just compare a char * to a std::string, the map has no idea that would be possible, so you make a std::string from your char * and compare that. Don't worry there are only a few dozen configuration parameters to check, what do you mean this hash lookup now incurs a heap allocation ?