Hacker News new | ask | show | jobs
by apendleton 1302 days ago
In Rust at least, once you instantiate the iterator, the array it's iterating over can't be mutated until the iterator is dropped, and that can be statically guaranteed at compile time. So you don't need to bounds-check at every access; you can decide at the outset how many iterations there are going to be, and doing that number of iterations will be known not to walk past the end.
1 comments

I don't think that's always possible in practice: consider Vec<T>, whose size is only known at runtime. A Vec<T>'s iterator can only do runtime bounds checking to avoid walking past the end.

That said, this is unavoidable in C/C++ too.

The Rust compiler guarantees that the memory location and size of the iterated array do not change during the operation. So the iterator can be a pointer that iterates until it points to the end of the array. There is no need to do bounds checks: the pointer only goes over the valid range.

In C/C++, the array can change. It might be moved, de-allocated or resized in the current or a synchronous thread. So the pointer that iterates until it is equal to the end pointer, might point to invalid data if the size, location or existence of the vector changes.

I think we're suffering from some fuzziness about what bounds checks we're referring to. Even in your example, you only need to check the size of the Vec<T> when you instantiate the iterator, not each time the iterator accesses an element, because at the time the iterator over the Vec<T>'s contents is instantiated, the Vec<T>'s size is known, and it can't change over the life of the iterator (because mutation is disallowed when there's an outstanding borrow). With a regular for-loop:

    for i in 0..v.len() {
        println!("{:?}", v[i]);
    }
you check the length at the top (the `v.len()`) and also for each `v[i]`. The first is unavoidable, but the second can be skipped when using an iterator instead, because it can be statically guaranteed that, even if you don't know at compile time what concretely the length is, whatever it ends up being, the index will never exceed it. Rust specifically differs from C++ in this respect, because nothing in that language prevents the underlying vector's length from changing while the iterator exists, so without per-access bounds checks it's still possible for an iterator to walk past the end.
When I read "iterator" I think of an object that points into the vector and can be advanced. For Rust's vector, that is std::slice::Iter (https://doc.rust-lang.org/std/slice/struct.Iter.html). When you advance an iterator, you must do a bounds check if the vector is dynamically sized; otherwise, you don't know when to stop. I.e., if I have

  let mut it = vec.iter();
  println!(it.next());
  println!(it.next());
  println!(it.next());
This needs to do bounds checking on each call to next() to either return Some(a) or None (assuming the length of vec is unknown at compile time). (hhttps://doc.rust-lang.org/beta/src/core/slice/iter/macros.rs....)

You are right that theoretically a range-based for loop that uses iterators does not need to do bounds checking because a compiler can infer the invariant that the iterator is always valid. In practice I don't know enough about LLVM or rustc to know whether this optimization is actually happening.

That's not an index-out-of-bounds check tho, that's just the regular loop exit condition. In a C-style for(i=0; i<42; ++i) that would be the i<42 part.

The check that can (and is) elided by the iterator here is the one that would be in the body of the loop, when you try to use i to index into the array. The nice thing about it is that it doesn't require the compiler to be smart at all.

The Rust docs for the 'for' keyword do say for loops are implemented as sugar for iterator loops. https://doc.rust-lang.org/std/keyword.for.html

Relevant part: > for-in-loops, or to be more precise, iterator loops, are a simple syntactic sugar over a common practice within Rust, which is to loop over anything that implements IntoIterator until the iterator returned by .into_iter() returns None (or the loop body uses break).

(The other uses of the 'for' keyword it refers to are unrelated to loops)

What parent means is that you won't have any bound checks on array access, just a len(arr) loop.