|
|
|
|
|
by jandrewrogers
2369 days ago
|
|
There are a couple common architectural patterns that are easy to express in C++ but tend to violate assumptions in Rust's safety model. Database engines run into them frequently due to the nature of their core operations, and this ignores that most data structures are intrinsically globals (which Rust doesn't like) due to tight hardware coupling. Rust assumes that all references to memory are visible at compile-time, and the safety analysis can be applied in cases where this is true with the usual caveats around borrow-checking. The "unsafe" facility is designed to interface with code written in other languages that don't respect Rust's model, and it works well for that. But how do you express the case, common in database engines (because direct storage-backed), where hardware can hold mutable references to most of your address space? There is no way to determine at compile-time if a mutable reference will be unique at runtime or to even sandbox it to a small bit of code. As a consequence, most memory reference are effectively mutable. There are workarounds that will minimize the quantity of unsafe code in Rust if you are willing to sacrifice performance and elegance. In databases, having many mutable references to the same memory has few safety implications because ownership of memory is dynamically assigned at runtime by a scheduler that guarantees safe access without locking or blocking. This safety model solves the hardware ownership problem, which is why it is used, but it also enables quite a bit of dynamic optimization even if all your references are in software so you'd want to do things this way anyway. In C++, you can make all of this largely transparent on top of explicitly mutable references to memory. Again, you can produce a minimally unsafe version of this in Rust but it is going to be significantly uglier and slower. As more server software moves to userspace I/O and scheduling models (for performance and scale reasons) it will be interesting to see how this impedance mismatch problem is addressed in Rust. |
|
As for hardware DMA’able memory, it’s true that it adds friction to work with in Rust. But C or C++ would fall into the same boat - one would need to sprinkle volatile or atomics, as appropriate, to avoid the optimizer from interfering. In Rust, you’d need to do the same (ptr::{read,write}_volatile or its atomics).
I’m having a slightly hard time imagining a db where “most” of the address space is DMA’able. I’ve some experience with kernel bypass networking, which has its own NIC interactions wrt memory buffers, but applications built on top have plenty of heap that’s unshared with hardware. What’s an example db where most of the VA is accessible to hardware “arbitrarily”?
Also, regardless of how much VA is shared, there’s going to be some protocol that the software and hardware use to coordinate. The interesting bit here is whether Rust and its type system can allow for expressing these protocols such that violations are compile-time detectable (if not all, perhaps some). Any sane C++ code would similarly try to build some abstractions around this, but how well things can be encoded is up for debate.
When a typical Rust discussion ensues, it’s commonly implied (or occasionally made explicit) that “write X in Rust” == “write X in safe Rust”. And this is the right default. But I think any non-trivial system hyper optimized for performance will have a healthy amount of unsafe code. The more interesting question, to me at least, is how well can that unsafety (and the “hidden”-from-rustc protocol) be contained.
As for a db scheduler obviating the need for a compiler to arbitrate ownership, that’s certainly true to a degree. But, this comes back to the protocol I mentioned - the scheduler is what provided the protocol and so long as other components work via it, it can provide safety barriers (and allow for optimizations). But again, I don’t immediately see why Rust (with careful use of unsafe) couldn’t do the same. And afterall, everything is safe so long as things play by the (often unwritten or poorly so) rules. Once systems get big and hairy, it gets tougher to stay within the guardrails and that’s where getting assistance from your language/compiler can be very helpful.
Some of the Rust libs/frameworks for embedded/microcontrollers deal with hardware accessible memory and otherwise “unsafe” protocols, but I’ve seen some clever ways folks encode this using Rust’s type system.