Hacker News new | ask | show | jobs
by NobodyNada 461 days ago
Rust's raw pointers are more-or-less equivalent to C pointers, with many of the same types of potential problems like dangling pointers or out-of-bounds access. Rust's references are the "safe" version of doing pointer operations; raw pointers exist so that you can express patterns that the borrow checker can't prove are sound.

Rust encourages using unsafe to "teach" the language new design patterns and data structures; and uses this heavily in its standard library. For example, the Vec type is a wrapper around a raw pointer, length, and capacity; and exposes a safe interface allowing you to create, manipulate, and access vectors with no risk of pointer math going wrong -- assuming the people who implemented the unsafe code inside of Vec didn't make a mistake, the external, safe interface is guaranteed to be sound no matter what external code does.

Think of unsafe not as "this code is unsafe", but as "I've proven this code to be safe, and the borrow checker can rely on it to prove the safety of the rest of my program."

1 comments

Why does Vec need to have any unsafe code? If you respond "speed"... then I will scratch my chin.

    > For example, the Vec type is a wrapper around a raw pointer, length, and capacity; and exposes a safe interface allowing you to create, manipulate, and access vectors with no risk of pointer math going wrong -- assuming the people who implemented the unsafe code inside of Vec didn't make a mistake, the external, safe interface is guaranteed to be sound no matter what external code does.
I'm sure you already know this, but you can do exactly the same in C by using an opaque pointer to protect the data structure. Then you write a bunch of functions that operate on the opaque pointer. You can use assert() to protect against unreasonable inputs.
Rust doesn't have compiler-magic support for anything like a vector. The language has syntax for fixed-sized arrays on the stack, and it supports references to variable-length slices; but it has no magic for constructing variable-length slices (e.g. C++'s `new[]` operator). In fact, the compiler doesn't really "know" about the heap at all.

Instead, all that functionality is written as Rust code in the standard library, such as Vec. This is what I mean by using unsafe code to "teach" the borrow checker: the language itself doesn't have any notion of growable arrays, so you use unsafe to define its semantics and interface, and now the borrow checker understands growable arrays. The alternative would be to make growable arrays some kind of compiler magic, but that's both harder to implement correctly and not generalizable.

> you can do exactly the same in C by using an opaque pointer to protect the data structure. Then you write a bunch of functions that operate on the opaque pointer. You can use assert() to protect against unreasonable inputs.

That's true and that's a great design pattern in C as well. But there are some crucial differences:

- Rust has no undefined behavior outside of unsafe blocks. This means you only need to audit unsafe blocks (and any invariants they assume) to be sure your program is UB-free. C does not have this property even if you code defensively at interface boundaries.

- In Rust, most of the invariants can be checked at compile time; the need for runtime asserts is less than in C.

- C provides no way to defend against dangling pointers without additional tooling & runtime overhead. For instance, if I write a dynamic vector and get a pointer to the element, there's no way to prevent me from using that pointer after I've freed the vector, or appended an element causing the container to get reallocated elsewhere.

Rust isn't some kind of silver bullet where you feed it C-like code and out comes memory safety. It's also not some kind of high-overhead garbage collected language where you have to write unsafe whenever you care about performance. Rather, Rust's philosophy is to allow you to define fundamental operations out of small encapsulated unsafe building blocks, and its magic is in being able to prove that the composition of these operations is safe, given the soundness of the individual components.

The stdlib provides enough of these building blocks for almost everything you need to do. Unsafe code in library/systems code is rare and used to teach the language of new patterns or data structures that can't be expressed solely in terms of the types exposed by the stdlib. Unsafe in application-level code is virtually never necessary.