Hacker News new | ask | show | jobs
by gobdovan 20 days ago
Rust is in an awkward position of being already complicated enough that adding proofs for skipping bounds checks probably will not happen for a long time, even though this kind of low-level operation is where a lot of optimisation is lost.

Compounding on this, Rust is also unstable underneath, since there is no public, stable contract for carrying high-level semantics from HIR into MIR. Because these high-level invariants are lost during compilation, the compiler cannot easily use them to prove and eliminate low-level safety checks. But even if the frontend was perfect, Rust relies on LLVM's language-neutral SCEV, which operates purely on low-level math and cannot reason about high-level language semantics.

Ultimately, a lot of things would need to change for Rust to pay no performance for safety features.

4 comments

> Compounding on this, Rust is also unstable underneath, since there is no public, stable contract for carrying high-level semantics from HIR into MIR. Because these high-level invariants are lost during compilation

Not sure if I'm just out of the loop, but I'm having a hard time following this line of reasoning. Why is a public and/or stable contract needed to carry high-level semantics from HIR to MIR? Neither seems necessary to me; from what I understand HIR and MIR are rustc-internal so public contracts shouldn't matter, and the lack of stability means the Rust devs aren't precluded by backwards compatibility from modifying the IRs to add the ability to carry such invariants.

Whoops! Although there is no public contract between HIR and MIR, the public part was not relevant here. What I wanted to highlight is that if they'd want to add proper proof machinery to eliminate low-level safety checks, they'd have to do it at: surface language, which is already complex enough; then HIR->MIR boundary with clean provenance (which I think current MIR would collapse too aggressively) and which may require a much clearer contract; then, even if they do the full front- and mid- ends properly, if you leave it up to LLVM, it ends up in SCEV, which is language neutral and would not be a good fit to support the proof obligations that would be specific to Rust.

I dug up a proposal from 2021 around bounds check hoisting in MIR, and from the discussion, details are pretty thorny [0]. It's narrower than general proofs but the frictions are very similar. The easiest example that shows HIR -> MIR difficulties is the part around `for i in 0..32 { a[i] = 1; }`. At the source level the range fact is super obvious, but after the for-loop/iterator lowering the MIR optimiser has to recover that `i` comes from exactly that range before it can turn 32 checks into the one hot-path check. Then it also would have to check for panic strategy to maintain the correct behaviour after optimisation.

[0] https://github.com/rust-lang/rust/issues/92327

Of course you can write the above as:

a[0..32].iter_mut().for_each(|el| *el = 1)

and have per-iteration bounds checks elided in Rust today.

Or as mentioned in the OP, just add at the top:

    assert!(a.len() >= 32);
    for i in 0..32 {
        a[i] = 0;
    }
Or:

    for i in 0..std::cmp::min(a.len(), 32) {
        a[i] = 0;
    }

I confess I hadn't thought about the implications of any of this before reading the article. If you need to squeeze the last 10% of performance out of your code, I'd consider it required reading.

As for the speed comparisons with C++, the OP says at the end you tell the C++ compiler to be as strict as Rust using "-D_FORTIFY_SOURCE=3 -fsanitize=bounds,object-size" & hardened STL, then it slows to below Rust speeds for the same safety unless you use the same techniques.

It's a shame the other optimisation techniques you need to bring Rust in line with C++ aren't as easy to apply.

Both rewrites differ semantically from:

for i in 0..32 { a[i] = 1; }

If a.len() == 16, the indexed loop writes a[0]..a[15] and then panics at a[16]. By contrast, both assert!(a.len() >= 32); and a[0..32].iter_mut().for_each(|el| *el = 1) fail before any writes occur. The former at the explicit assertion, the latter while creating the a[0..32] subslice. That difference is observable if the panic is caught, and the panic location/message may also differ. This is why these are valid manual rewrites only when the intended precondition is "the slice has length at least 32," not generally valid compiler rewrites of the original loop.

The GitHub issue discussion is directly about these concerns and discuss whether bounds checks may fail early, whether intermediate writes are observable after catch_unwind and whether panic behavior must be preserved.

> The GitHub issue discussion is directly about these concerns and discuss whether bounds checks may fail early, whether intermediate writes are observable after catch_unwind and whether panic behavior must be preserved.

No argument about the point of the issue. But this is a discussion about the relative efficiency of C, C++ and Rust. My point is there is a way in Rust to say "I don't care about observable writes, hoist the bounds check out of the loop", so that the efficiency is the same.

Admittedly, it's not part of the language definition. You're relying on intimate knowledge of how the optimiser works. In fact, you are probably pasting the code into godbolt, and looking at the assembler produced. But if you care about cycles that much, that's true for all three languages.

That's relevant if we're talking about the compiler automatically rewriting the code, but the chances are if you're writing this code yourself that the array will always have >= 32 elements.
OK, I think that makes more sense. Thanks for taking the time to explain!
The overhead of bounds checking varies a lot. In the common case it's negligible (few percents), but in some cases, depending on what you build, it can go up to even 20%. And if it prevents autovectorization it can cost even more.

There are techniques to minimize the perf loss, though (safely), and of course you can use unsafe code. If you do it smartly, in the vast majority of cases bound checks do not matter (in fact, even in C++ there is a push for a hardened standard library that does bound checks, and e.g. Google uses that).

Rust will never include full proofs, but it might include ranged integers which can minimize bound checks even more.

The benchmarks in this talk show that the bounds checks are mostly insignificant, and actually it's the integer overflow checks that are far more costly.

Actually nm, I forgot those are disabled in release mode. Good decision I guess.

Do they even count towards safety if they aren't in release mode?
You can enable them in release mode optionally. But I would say not. Really we need ISAs to provide a no-overhead way to check integer overflow.
You can sometimes just add asserts for the index variable(s) and have the LLVM optimizer go "hmm, I should try to prove that those are true" and then get the range checks optimized away.