Hacker News new | ask | show | jobs
by rstuart4133 16 days ago
Or as mentioned in the OP, just add at the top:

    assert!(a.len() >= 32);
    for i in 0..32 {
        a[i] = 0;
    }
Or:

    for i in 0..std::cmp::min(a.len(), 32) {
        a[i] = 0;
    }

I confess I hadn't thought about the implications of any of this before reading the article. If you need to squeeze the last 10% of performance out of your code, I'd consider it required reading.

As for the speed comparisons with C++, the OP says at the end you tell the C++ compiler to be as strict as Rust using "-D_FORTIFY_SOURCE=3 -fsanitize=bounds,object-size" & hardened STL, then it slows to below Rust speeds for the same safety unless you use the same techniques.

It's a shame the other optimisation techniques you need to bring Rust in line with C++ aren't as easy to apply.

1 comments

Both rewrites differ semantically from:

for i in 0..32 { a[i] = 1; }

If a.len() == 16, the indexed loop writes a[0]..a[15] and then panics at a[16]. By contrast, both assert!(a.len() >= 32); and a[0..32].iter_mut().for_each(|el| *el = 1) fail before any writes occur. The former at the explicit assertion, the latter while creating the a[0..32] subslice. That difference is observable if the panic is caught, and the panic location/message may also differ. This is why these are valid manual rewrites only when the intended precondition is "the slice has length at least 32," not generally valid compiler rewrites of the original loop.

The GitHub issue discussion is directly about these concerns and discuss whether bounds checks may fail early, whether intermediate writes are observable after catch_unwind and whether panic behavior must be preserved.

> The GitHub issue discussion is directly about these concerns and discuss whether bounds checks may fail early, whether intermediate writes are observable after catch_unwind and whether panic behavior must be preserved.

No argument about the point of the issue. But this is a discussion about the relative efficiency of C, C++ and Rust. My point is there is a way in Rust to say "I don't care about observable writes, hoist the bounds check out of the loop", so that the efficiency is the same.

Admittedly, it's not part of the language definition. You're relying on intimate knowledge of how the optimiser works. In fact, you are probably pasting the code into godbolt, and looking at the assembler produced. But if you care about cycles that much, that's true for all three languages.

That's relevant if we're talking about the compiler automatically rewriting the code, but the chances are if you're writing this code yourself that the array will always have >= 32 elements.