Hacker News new | ask | show | jobs
by sanxiyn 1887 days ago
No, borrow checker's formal rules are rather simple. Here it is:

It is an error to access a place, when an access conflicts with a loan, and the loan is live.

1 comments

You're describing the desired behavior of the compiler, but not what it actually implements.

In fact there are an infinite number of correct Rust programs which will never access memory incorrectly but which will still be rejected by the compiler, for the simple reason that Rust's authors, talented though they may be, have made no progress at all at solving the Halting Problem.

What Rust actually accepts is a subset of correct programs. And this subset is somewhat informally defined as "whatever the compiler could manage to prove". Real code hits against this limit occasionally, and when it does there's really no option other than "join the Rust team" or "try voodoo".

There are two problems here. Rust accepts a subset of correct programs that involve borrowing, that's absolutely true, but it gives you the tools to get around that: you either use Arc<Mutex<T>, where now you are in the same space that any refcounted GC language is in (with the extra flexibility that if you can "choose your guarantees"[1]), or use unsafe which lets you dereference memory freely in the same way that you would in C or C++.

The unstated problem is that there are patterns from other languages that are actually invalid and Rust is correctly denying, but not properly communicating it, which causes frustration. You can always make the argument that the failure modes are benign or rare, and for those cases noone is stopping you from actually using unsafe.

[1]: https://manishearth.github.io/blog/2015/05/27/wrapper-types-...

> The unstated problem is that there are patterns from other languages that are actually invalid

And this is why this argument persists. This is, to people outside the community, a semantic evasion:

1. It relies on a Rust-internal definition for "actually valid" (conforming to Rust's specific set of provability requirements) that doesn't correspond to what the rest of the world views as "correct" (not behaving incorrectly). Think about stuff like allocate-through-a-session-and-free-in-a-block paradigms (Apache was famous for this), or run-once-and-exit, or garbage collection, etc... Those things aren't "invalid" in any reasonable sense, they're just not what Rust programs do.

2. The definition for "valid" is (and this is my point above) entirely Rust-internal and ad hoc. It's not that we refuse to conform to your rules, really, it's that we don't know what they are!

There's an anecdote about a similar effect in the article too:

> Fun fact: while at Mozilla I heard multiple anecdotes of [very intelligent] Firefox developers thinking they had found a bug in Rust's borrow checker because they thought it was impossible for a flagged error to occur." However, after sufficient investigation the result was always (maybe with an exception or two because Mozilla adopted Rust very early) that the Rust compiler was correct and the developer's assertions about how code could behave was incorrect. In these cases, the Rust compiler likely prevented hard-to-debug bugs or even exploitable security vulnerabilities. I remember one developer exclaiming that if the bug had shipped, it would have taken weeks to debug and would likely have gone unfixed for years unless its severity warranted staffing.

Sometimes people think that they're right and that the compiler is wrong, but often, (not not necessarily always!) it's that they forgot some important piece of context.

(Oh, and Rust absolutely can do "allocate in a block and free at once" stuff, or "run once and exit" stuff...)

Rules are actually simple and not ad hoc. I in fact think Rust documentation should state rules and make them more prominent. In my experience, knowing the rules does not help you to keep them, but at least we will get less complaints of the sort "we don't know what the rules are".

To recap, here are the rules:

It is an error to access a place, when an access conflicts with a loan, and the loan is live. Access means source code construct to read or write, whether the construct is actually executed in runtime is irrelevant. Place is static approximation of a set of bytes in the memory. Place is either local (x), field (x.f), or upvar (local captured in closure). Indexing (x[i]) and method calls (x.m()) are approximated as local (x). Loan is either shared (&x) or mutable (&mut x). Read access conflicts with mutable loan, write access conflicts with all loans. Expression is live from the definition to the end of the containing statement. Declaration is live in connected region of control flow graph from the definition to potentially multiple last uses.

No, the rule I stated is what is actually implemented.

Like all type checking technology, the rule is about static behavior, not dynamic behavior. So halting problem is irrelevant.

Rust accepts all programs that follow the rule in static behavior. It's not a subset, and it's not a best effort to prove. It's the complete statement.

"Access" means source code construct to read or write. Rust doesn't care whether the construct is actually executed in runtime.

I believe the "subset of correct code" refers to things like the current non-support for things like

    let (left, right) = (&mut foo[..split], &mut foo[split..]);
where you have to rely on things like foo.split_at_mut(split), which are implemented as unsafe under the cover.

I see these are limitations but not show-stoppers in any way.

It's surely not a showstopper when it's on one line, but again the problem in practice is that those slices may be other operations or hidden behind other APIs, and happening at different depths in a complicated call tree, and you get an error only up at the top when what you're doing looks "clearly correct".

Yes, sure, the rule in question may be simple in the abstract (it's a compiler, after all -- they're just software doing straightforward things). But the ability for the poor programmer to detect which rule is being violated where is a lot more limited than the compiler is designed for. Thus the user with the upthread complaint, which is hardly unique.

I mean, at the end of the day if Rust wants to be an everyday language for this kind of everyday problem, the analysis paradigm needs to be communicated much better to everyday hackers (via docs, error messages, whatever). When Rust was new this seemed like just a technical problem to be solved with software maturity. I guess at this point after several years of regularly returning to play with Rust and being frustrated every time, I've mostly given up.

Sure, but you claimed there is no simple formal rule. That's simply false. It's just that the formal statement is not front and center in documentations, because it is not very useful to learn the rule, and most documentations are targeted to learners.
Yes, but the rule is still complete. What is going on is "place" is currently defined to be local x, or field x.f, or index x[i], but all x[i] are merged to x[*]. Rust's so-called NLL changed definition of "live". The point is, the rule is simple, formal, and complete. It is neither an ad hoc rule nor a best effort prover.
> What is going on is "place" is currently defined to be

Not to belabor this growing thread, but here's the disconnect. I complained the rules were ad hoc and complicated, you replied that they were simple, and when challenged on an edge case your treatment is to add another clause to re-define a term you used in isolation earlier.

That's what "ad hoc and complicated" means.

"place" and "live" have their respective intuitive definitions. To clarify, "place" is a set of bytes in the memory. "live" means it is used again. But both are runtime concepts, so Rust compiler needs to compute static approximation of these runtime concepts. This static approximation is what is complex and changing. But the principle is simple.

Actually, my elaboration of "place", local/field/index, is nearly complete. The only thing missing is upvar, which is local captured in closure. Pre-NLL "live" is simple: expression is live from the definition to the end of the containing statement, and declaration is live from the definition to the end of the containing block. Post-NLL "live" is connected region of control flow graph from the definition to potentially multiple last uses. Really, that is the whole story.

There are two changes currently in development. One changes definition of "place" to include field captured in closure, in addition to local captured in closure. The other is more drastic, changing definition of "live": a loan is live where the loan is origin of a variable and the variable is live. This is great, because current approximation is equivalent to union of all origins.

Link to the relevant spec then? Admittedly my intuition is a year stale at the moment, but I went looking hard for this at one point and found nothing other than source code.
No spec, because technical details are complex, but spec, when it is written, will just be an elaboration of "It is an error to access a place, when an access conflicts with a loan, and the loan is live". This has never changed since Rust 1.0, more than 5 years. Spec will just need to define "place" and "live" and that takes time.
There's no spec yet, but there are working groups working on it.