Hacker News new | ask | show | jobs
by glouwbug 2233 days ago
Imagine if -fanalyze was like rusts borrow checker
4 comments

You Rust borrow checker requires special annotations and restrictions put on the code to do its job. I don't think you could something like that automatically on a C or C++ full codebase without having to manually annotate and refactor it somewhat. There are many common (and safe) C and C++ patterns that would be outright rejected by Rust's borrow checker, for instance initializing a structure or array partially if you're sure that nobody is going to use the initialized portion. Or having multiple mutable pointers/reference to the same object.

You could do something like that at runtime though, but then you have Valgrind, basically.

> for instance initializing a structure or array partially if you're sure that nobody is going to use the initialized portion. Or having multiple mutable pointers/reference to the same object.

Rust supports MaybeUninit<> for the former example, and unsafe raw pointers for the latter. It needs unsafe because these patterns are not safe in the general case and absent an actual proof of correctness embedded in the source code, a static analysis pass can only deal with the general case.

That's my point though, in both cases the developer needs to add additional syntax to make the intent clear. "Naive" Rust code that tries to do that stuff is rejected by the compiler.

I've expressed myself poorly in my original comment and apparently it looks like I was criticizing Rust but I wasn't. I was just pointing out that safety didn't come "for free" by toggling a compiler flag, you have to change the way you code some things. If C and C++ were to become safe languages, code would need to be rewritten using things like MaybeUninit, split_at_mut, RefCell etc...

I would dispute that those common patterns are indeed safe, even if they could be argued they are when first written because code changes and can suddenly break your preconditions if they aren't enforced in the code itself.

C codebases then follow certain defensive programming customs to avoid reading uninitialized or out of bounds memory, at the cost of some performance. This is the right trade-off in C but, funnily enough, the more restrictive borrow checker has the opposite effect as you can give out inmutable and mutable references with wanton abandon because they get checked for unsafe behavior. It's the same difference as a a gun where the best practice is to keep it unchambered at all times to avoid the risk of a misfire, and a more modern gun with a safety: it's one more thing to think about but it actually smoothes the operation.

I'd describe the pattern in slightly different terms: when done right, restrictions in a programming language (or library/framework) are liberating for the programmer.

The restriction of immutability spares the programmer from worrying about whether unknown parts of the codebase are going to decide to mutate an object.

JavaScript's single-thread restriction (not counting web-workers) closes the door on all manner of nasty concurrent-programming problems that can arise in languages that promote overuse of threads. (Last I checked, NetBeans uses over 20 threads.)

Back to the example at hand, C has no restrictions, but that hobbles the programmer when it comes to reasoning about the way memory is handled in a program. It's completely free-form. Rust takes a more restrictive approach, and even enables automated reasoning. (Disclaimer: I don't know much about Rust.)

> Rust borrow checker requires special annotations and restrictions put on the code to do its job.

This is a good thing, because it makes lifetimes and ownership explicit and visible in the code. It serves the similar purpose as type annotations in function signatures.

> Or having multiple mutable pointers/reference to the same object

Sure you can have that with `unsafe`. And this is a good thing, because multiple mutable pointers to the same object is at best bad coding practice that leads to unsafe code, and you should avoid that in any language, including the ones with GC. Working with codebases where there are multiple distant things mutating shared stuff is a terrible experience.

If a C/C++ version of "borrow checker" could mark such (anti)patterns at least as warnings, that would bring a lot of value.

He wasn't criticizing Rust, he was just stating facts.
I read it differently, because he started with "You(r) Rust borrow checker", making his point automatically in oposition. But now after reading without this You at the beginning, I agree it was neutral.
My guess is that comment was made on a phone or tablet. It has a lot of small, autocorrect-looking mistakes. Other than the "You", for example there is one part where "initialized" is used where "uninitialized" is clearly intended.
I wish I could use that excuse, I just have the habit of posting first and then proofreading and editing, but noprocrast kicked in and then I switched to something else and now I can't edit it anymore.

However if I sounded like I wanted to belittle Rust I really expressed myself poorly, I love the language and hope it'll eventually become the new C++. If anything I was attempting to make the opposite point: you can't fix C/C++'s flaws with a smarter compiler. We're not two releases of GCC away from having safe C without having to change anything.

Can you explain why multiple mutable pointers is bad practice?

I understand the benefits and the risks of them, and understand how Rust prevents both, but I dont yet understand why it's bad practice, and am interested to learn why.

The one that affects you as a programmer most is Iterator invalidation. Iter borrows from the vector, you mutate the vector, iter blows up. Simple really. But a lot of code is like this. Borrow from hashmap, insert into hashmap, the slot gets moved around and your pointer is now invalid. That’s just vectors and hashmaps; imagine the possibilities in a much more complex data structure.

There are compiler optimisations you can do if compiler knows about aliasing, but that’s not so much a software authorship problem. There are some curly problems with passing aliased mutable pointers to a function written for non-aliased inputs, like memcpy and I imagine quite a lot of other code.

But common to all of these things is that it’s pretty hard to figure out if the programmer is the one who has to track the aliasing. In hashmap pointer invalidation, your code might work perfectly for years until some input comes along and finally triggers a shift or a reallocation at the exact right time. (I know this — I recently had to write a lot of C and these are the kinds of issues you get even after you implement some of Rust’s std APIs in C.)

Is this still bad practice if the container can detect when this happens, like Java's? Not saying that it always has to throw like Java, I could imagine implementing a weak iter which we'd check before each operation.
I feel like this is something common c++ developers know, and it's not worth all the baggage to tag it. You just control the iterator inside the loop instead of in the loop declaration.
Because it leads to hard to understand code.

If N unrelated (or loosely related) things can mutate the same object, then you get a O(x^N) explosion of potential mutation orders and in order to understand that, you need to understand all the (sometimes complex) time-relationships between these N objects. This gets even much worse when some of these objects are also pointed from M other objects...

On the flip side, in case of using a simple unique_ptr (or a similar concept), this trivially reduces to a single sequence of modifications.

Dont we in Rust still conceptually modify the vector multiple times, just through different means (usually a generational index or something)?
> Sure you can have that with `unsafe`

The parent was talking about the borrow checker so I only was talking about safe Rust code. Obviously if you consider that the entire C/C++ codebase is in a big unsafe {} block it'll work... because it won't do anything at all.

From Rust docs:

It's important to understand that unsafe doesn't turn off the borrow checker or disable any other of Rust's safety checks: if you use a reference in unsafe code, it will still be checked. The unsafe keyword only gives you access to these four features that are then not checked by the compiler for memory safety.

One of those four features is dereferencing pointers, and unlike references, pointers are not checked by the borrow checker. So you could bypass the borrow checker using unsafe code in a way, though most probably you should not.
> This is a good thing, because it makes lifetimes and ownership explicit and visible in the code.

No, it is additional burden. If it was possible to do it without annotations, you bet we would do it!

It's only a burden to the extent that type annotations are a burden. It's definitely possible (including in Rust) to do away with both, but that has downsides of its own.
I am not saying you cannot write code without them, but that you cannot do away with them without losing what they bring.
While it isn't at Rust level, that doesn't stop Google and Microsoft from trying.

"Update on C++ Core Guidelines Lifetime Analysis. Gábor Horváth. CoreHard Spring 2019"

https://www.youtube.com/watch?v=EeEjgT4OJ3E

>You Rust borrow checker requires special annotations and restrictions put on the code to do its job. I don't think you could something like that automatically on a C or C++ full codebase without having to manually annotate and refactor it somewhat.

What about with a constrained (not necessarily general purpose) AI with the expertise of Scott Meyers, Andrei Alexandrescu, Herb Sutter and Alexander Stepanov?

That is what Microsoft and Google are trying to do with C++ Lifetime Profile.

https://herbsutter.com/2018/09/20/lifetime-profile-v1-0-post...

"Update on C++ Core Guidelines Lifetime Analysis. Gábor Horváth. CoreHard Spring 2019"

https://www.youtube.com/watch?v=EeEjgT4OJ3E

While it might never be Rust like due to language semantics, it is way better than not having anything.

I think Rice's theorem means that you can't really do that without restricting/annotating semantics like Rust does.
No need to imagine, It's becoming reality -> https://internals.rust-lang.org/t/c-lifetime-profile-1-0-a-k...