Yep. The array index pattern is unsafe code without the unsafe keyword. Amazing how much trouble Rust people go through to make code "safe" only to undermine this safety by emulating unsafe code with safe code.
It’s not the same. The term “safe” has a specific meaning in rust: memory safety. As in:
- no buffer overflows
- no use after free
- no data races
These problems lead to security vulnerabilities whose scope extends beyond your application. Buffer overflows have historically been the primary mechanism for taking over entire machines. If you emulate pointers with Rust indices and don’t use “unsafe”, those types of attacks are impossible.
What you’re referring to here is correctness. Safe Rust still allows you to write programs which can be placed in an invalid state, and that may have security implications for your application.
It would be great if the compiler could guarantee that invalid states are unreachable. But those types of guarantees exist on a continuum and no language can do all the work for you.
"Safe" as a colloquial meaning: free from danger. The whole reason we care about memory safety is that memory errors become security issues. Rust does nothing to prevent memory leaks and deadlocks, but it does prevent memory errors becoming arbitrary code execution.
Rust programs may contain memory errors (e.g. improper use of interior mutability and out of bounds array access), but the runtime guarantees that these errors don't become security issues.
This is good.
When you start using array indices to manage objects, you give up some of the protections built into the Rust type system. Yes, you're still safe from some classes of vulnerability, but other kinds of vulnerabilities, ones you thought you abolished because "Rust provides memory safety!!!", reappear.
Rust is a last resort. Just write managed code. And if you insist on Rust, reach for Arc before using the array index hack.
Still, being free from GC is important in some domains. Beyond being able to attach types to scopes via lifetimes, it also provides runtime array bounds checks, reference-counting shared pointers, tagged unions, etc. These are the techniques used by managed languages to achieve memory-safety and correctness!
For me, Rust occupies an in-between space. It gives you more memory-safe tools to describe your problem domain than C. But it is less colloquially "safe" than managed languages because ownership is hard.
Your larger point with indices is true: using them throws away some benefits of lifetimes. The issue is granularity. The allocation assigned to the collection as a whole is governed by rust ownership. The structures you choose to put inside that allocation are not. In your user ID example, the programmer of that system should have used a generational arena such as:
It solves exactly this problem. When you `free` any index, it bumps a counter which is paired with the next allocated index/slot pair. If you want to avoid having to "free" it manually, you'll have to devise a system using `Drop` and a combination of command queues, reference-counted cells, locks, whatever makes sense. Without a GC you need to address the issue of allocating/freeing slots for objects within in an allocation in some way.
Much of the Rust ecosystem is libraries written by people who work hard to think through just these types of problems. They ask: "ok, we've solved memory-safety, now how can we help make code dealing with this other thing more ergonomic and correct by default?".
Absolutely. If I had to use an index model in Rust, I'd use that kind of generational approach. I just worry that people aren't going to be diligent enough to take precautions like this.
Even when you use array indices, I don't think you give those protections up. Maybe a few, sure, but the situation is still overall improved.
Many of the rules references have to live by, are also applied to arrays:
- You cannot have two owners simultaneously hold a mutable reference to a region of the array (unless they are not overlapping)
- The array itself keeps the Sync/Send traits, providing thread safety
- The compiler cannot do provenance-based optimizations, and thus cannot introduce undefined behavior; most other kinds of undefined behavior are still prevented
- Null dereferences still do not exist and other classes of errors related to pointers still do not exist
Logic errors and security issues will still exist of course, but Rust never claimed guarantees against them; only guarantees against undefined behavior.
I'm not going to argue against managed code. If you can afford a GC, you should absolutely use it. But, compared to C++, if you have to make that choice, safety-wise Rust is overall an improvement.
You can still have use-after-free errors when you use array indices.
This can happen if you implement a way to "free" elements stored in the vector.
"free" should be interpreted in a wide sense.
There's no way for Rust to prevent you from marking an array index as free and later using it.
> There's no way for Rust to prevent you from marking an array index as free and later using it.
I 2/3rds disagree with this. There are three different cases:
- Plain Vec<T>. In this case you just can't remove elements. (At least not without screwing up the indexes of other elements, so not in the cases we're talking about here.)
- Vec<Option<T>>. In this case you can make index reuse mistakes. However, this is less efficient and less convenient than...
- SlotMap<T> or similar. This uses generational indexes to solve the reuse problem, and it provides other nice conveniences. The only real downside is that you need to know about it and take a dependency.
The consequences of use-after-free are different for the two.
In rust it is a logic error, which leads to data corruption or program panics within your application. In C it leads to data corruption and is an attack vector for the entire machine.
And yes, while Rust itself doesn’t help you with this type of error, there are plenty of Rust libraries which do.
The semantics of a POSIX program are well-defined under arbitrary memory corruption too --- just at a low level. Even with a busted heap, execution is deterministic and the every interaction with the kernel has defined behavior --- even if they behavior is SIGSEGV.
Likewise, safe but buggy Rust might be well-defined at one level of abstraction but not another.
Imagine an array index scheme for logged-in-user objects. Suppose we grab an index to an unprivileged user and stuff it in some data structure, letting it dangle. The user logs out. The index is still around. Now a privileged user logs in and reuses the same slot. We do an access check against the old index stored in the data structure. Boom! Security problems of EXACTLY the sort we have in C.
It doesn't matter that the behavior is well-defined at the Rust level: the application still has an escalation of privilege vulnerability arising from a use-after-free even if no part of the program has the word u-n-s-a-f-e.
Undefined behavior in C/C++ has a different meaning than you're using. If a compiler encounters a piece of code that does something whose behavior is undefined in the spec, it can theoretically emit code that does anything and still be compliant with the standards. This could include things like setting the device on fire and launching missiles, but more typically is something seemingly innocuous like ignoring that part of the code entirely.
An example I've seen in actual code:
You checked for null before dereferencing a variable, but there is one code path that bypasses the null check. The compiler knows that dereferencing a null pointer is undefined so it concludes that the pointer can never be null and removes the null checks from all of the code paths as an "optimization".
That's the C/C++ foot-gun of undefined behavior. It's very different from memory safety and correctness that you're conflating it with.
From the kernel's POV, there's no undefined behavior in user code. (If the kernel knew a program had violated C's memory rules, it could kill it and we wouldn't have endemic security vulnerabilities.) Likewise, in safe Rust, the access to that array might be well defined with respect to Rust's view of the world (just like even UB in C programs is well defined from the kernel POV), but it can still cause havoc at a higher level of abstraction --- your application. And it's hard to predict what kind of breakage at the application layer might result.
- no buffer overflows - no use after free - no data races
These problems lead to security vulnerabilities whose scope extends beyond your application. Buffer overflows have historically been the primary mechanism for taking over entire machines. If you emulate pointers with Rust indices and don’t use “unsafe”, those types of attacks are impossible.
What you’re referring to here is correctness. Safe Rust still allows you to write programs which can be placed in an invalid state, and that may have security implications for your application.
It would be great if the compiler could guarantee that invalid states are unreachable. But those types of guarantees exist on a continuum and no language can do all the work for you.