Hacker News new | ask | show | jobs
by scoutt 2378 days ago
But in this case the overhead may come from calling get_unchecked() for every array access, is this correct? Unless the function it's inlined and does a quantifiable amount of work (for a given arch), but this may also have an impact on code size. In a way or another, there should be a trade-off for accessing array elements (a very common operation).

I don't know how all of this can be coupled with the embedded ecosystem we are used to work with.

2 comments

So, here's a fun example of how this works out in practice: https://godbolt.org/z/DXo25P

Here, rustc is able to see that we always have a first element, and will actually completely remove the unchecked one, and replace the body with the "checked" one, which has no checks.

For some reason, I can't get it to show the assembly for just the two functions; it always optimzies everything out. Putting it on the rust playground says this:

  playground::access_first_element: # @playground::access_first_element
  # %bb.0:
 pushq %rax
 cmpq $2, %rsi
 jb .LBB5_2
  # %bb.1:
 movq %rdi, %rax
 addq $4, %rax
 popq %rcx
 retq

  .LBB5_2:
 movq %rsi, %rdx
 leaq .L__unnamed_2(%rip), %rdi
 movl $1, %esi
 callq *core::panicking::panic_bounds_check@GOTPCREL(%rip)
 ud2
                                        # -- End function

  playground::access_first_unchecked: # @playground::access_first_unchecked
  # %bb.0:
 leaq 4(%rdi), %rax
 retq
                                        # -- End function
as you can see, it gets super inlined, and does no work, compared to the bound checked version.
Thank you and the other users for the patience.

I guess in the case of your code, the checks are removed/optimized because the compiler knows the size of the array (since it's static, as in non-dynamically-allocated)?

Probably the out-of-bounds runtime checks will (or should be) enforced when it's about dynamically allocated arrays.

Yep!

Rust can do a lot, even for dynamically allocated ones. Here's a fun example: https://godbolt.org/z/n4ubZS

You can see that add has to do the bounds checks, because it doesn't know how long the slice is. But, foo gets compiled down to a single lea, because even though we create a dynamic length vector, the compiler can tell that it has two elements, and get rid of all of it: the malloc, the bounds checks, everything.

Or take these two versions of a summation function: https://godbolt.org/z/LzGURB

The first uses a manual loop, but rustc eliminates all of the bounds checks, because it knows that it can't possibly go out of bounds. For completeness, I included the iterator version too; you can see the ASM is (I think) identical. (I only looked at the line count and glanced, I didn't compare all 89 lines exactly.)

If rustc can't analyze something, but you can, you can also help hint it with asserts. I don't have an easy example handy, but for example, if rustc does a bounds check in the body of the loop, you could assert! outside of the loop and it has the effect of hoisting the check manually. It's pretty good at doing this on its own, though.

get_unchecked() is marked as an #[inline] function (like most other low level primitives), and through the magic of compiler optimization it should have exactly 0 overhead. The function should most likely compile down to a single CPU instruction.