Hacker News new | ask | show | jobs
by whytevuhuni 489 days ago
Note that memory-safe languages like Java and Python will also depend on libraries which are not safe, e.g. in their implementation of the interpreter/VM, standard library, native libraries like numpy, etc.

In practice unsafe Rust tends to be less than 1% of the code (or around 5% in low-level things like kernels), which is a similar ratio of trusted/untrusted code. The top-level application code is generally expected to have 0%.

1 comments

This statistic is based on the lines of code bounded by the "unsafe" keyword but that is not the full picture. Unsafe code relies for its safety on the logical correctness of safe code. In theory, as "proper practice", the trust boundary should be the crate. In other words, the safety of a particular "unsafe" block should be verifiable by checking the correctness of code within the same crate. In practice, this is not true: much unsafe code relies for safety on correctness properties in external code, especially from the standard library, like the fact that "Vec" stores elements contiguously in memory in order of index.

That isn't to say the concept is useless. But it is not the case that you just need to inspect code marked "unsafe". A change to code not marked "unsafe" can break the invariants and assumptions that unsafe blocks elsewhere rely on.

Agreed. I would add to your argument as follows.

If you stick with "21st Century C++" [1], you never have to use a pointer. At some point, there will be (say) a -W20 flag that produces a warning if you use a pointer or other potentially-unsafe idiom without marking it with a [[20th_century]] attribute.

That attribute is equivalent to Rust's unsafe keyword. At that point, the safety difference will boil down to opt-in versus opt-out. But you can already opt-in today by policing yourself.

That's quite far from the radical language difference it's made out to be.

[1] https://news.ycombinator.com/item?id=42946321

I've used both, and I'd say it's still quite a difference.

There's many examples, but one I stumble on often is functions that accept a std::string_view, and return a part of that string. It's quite dangerous to make it return a std::string_view as well, especially given how easy it is to create temporaries (e.g. get_word(sentence1 + sentence2) would return a string_view to temporary data). Similar story with functions that return element references out of collections, functions that choose an element out of two, etc.

In Rust on the other hand, you can get very reckless with very long chains of functions that return references out of other references, especially with parsers, iterators, and lambdas/closures.

The Send/Sync traits for thread safety are also unmatched in C++, tagged union pattern matching will also heavily restrict invalid states, and so will Rust's affine(ish) types, which force a single owner for resources, as opposed to C++'s somewhat hard-to-use move semantics.

The standard library also prefers safety over speed by default, e.g. with std::string_view's indexing being UB, std::optional's dereference being UB, and so on.

I don't really disagree with what you wrote. To expand upon my argument, the issues you mention can also be self-policed, or maybe enforced with higher safety levels of the presumptive -W20 switch:

- No reference variables in user code (reference parameters are OK.)

- Any classes that hold reference member variables can only be instantiated on the stack. The objects the references refer to must either be stack objects or reference parameters to the instantiating function.

- For durable links to sub-objects, use handles (tuples of std::shared_ptr to the main object plus a sub-object link or identifier.)

- Alternatively, require externally-accessible sub-objects to be held in shared_ptr's by the main object. They can then have superset lifetimes when needed.

I'll stick to the memory safety topic, but similar solutions exist for some of the other topics you mention, e.g. by avoiding std::optional.

That's right, unsafety extends to the whole module, and anything else that relies on the module's internal implementation. Generally all the way up to the abstraction's public API, whatever that abstraction is (a Vec, a Mutex, etc).

But it's nevertheless still not different from other languages that are deemed memory safe.

modules are meant to be the boundary for a safe abstraction, but sometimes you get fun holes like https://github.com/rust-lang/rust/issues/128872 where you can implement Copy for a type in a different module in the same crate, even if the fields are private
I'll admit, that's a pretty funny bug. Thankfully it seems like the Rust devs are pretty keen on fixing it somehow.