Hacker News new | ask | show | jobs
by tatterdemalion 3067 days ago
> The major difficulty here is that in general, unsafe pieces of code cannot be safely composed, even if the unsafe pieces of code are individually safe. This allows you to bypass runtime safety checks without unsafe code just by composing "safe" modules that internally use unsafe code in their implementation.

This comment suggests you don't have much domain knowledge about how `unsafe` in Rust works, so I'm surprised you speak with such confidence. Your comment is flatly wrong: users using only safe code are not responsibility for guaranteeing the composed safety of the components they use (whether or not they are implemented with unsafe code).

Interfaces marked safe must uphold Rust's safety guarantees, or they are incorrect. They are just wrong if they have additional untyped invariants that need to be maintained to guarantee their safety; interfaces like this must be marked `unsafe`.

Because they cannot depend on untyped invariants, any correct implementation with a safe interface can be composed with any other. This ability to create safe abstractions over unsafe code which extend the reasoning ability of the type system is a fundamental value proposition of Rust.

2 comments

> This comment suggests you don't have much domain knowledge about how `unsafe` in Rust works, so I'm surprised you speak with such confidence.

I hate being tone police, but jeez, we're having a discussion about Rust here and talking about my personal competency is inappropriate and unwelcome.

The problem I'm talking about happens when you write libraries that contain "unsafe" blocks. You want to prove (or at least assure yourself) that no unsafe behavior is observable by clients of the library. However, the way to do this is not entirely clear, although there is research being done in this area. One known trap is that it is not sufficient to demonstrate that Rust code without "unsafe" blocks cannot observe unsafe behavior in your library.

See: https://plv.mpi-sws.org/rustbelt/popl18/paper.pdf

These concerns are not hypothetical, there have been soundness problems in the Rust standard library before and I expect it to happen again.

Proving the correctness of unsafe code is totally different from what you talked about, which was composing different abstractions with unsafe internals together.

Users of safe Rust do not need to worry about whether the composition of two safe interfaces that use unsafe internally is safe unless one of those interfaces is incorrect. Your comment would suggest that users need to think about the untyped invariants of each library they use, but this is not correct, libraries are not allowed to rely on untyped invariants for the correctness of their safe APIs.

The problem with talking about this subject is that "safe" and "unsafe" are overloaded terms in Rust, so I can understand why you think I was talking about something different.

Let R be arbitrary Rust code with no "unsafe" blocks. Let X and Y be libraries with "unsafe" blocks. You can prove that R + X is safe, and prove that R + Y is safe, but you haven't yet proven R + X + Y is safe. This is the hard part, because without an understanding of what property of X and Y individually makes R + X + Y + Z + ... safe, we don't have a good definition for what makes an interface "safe".

And this is what I mean when I say that this is not only a pedagogical problem.

You have restated your position, but it is still incorrect in the context of this discussion. Even your original statement of "R be[ing] arbitrary Rust code with no 'unsafe' blocks" is problematic: any Rust code is, very unavoidably, built upon a foundation of unsafe code. It has to be, because it's running on an "unsafe" processor. And yet, any safe Rust code in the core library (barring a safeness bug) is obviously safely composable with any other safe Rust code precisely because it obeys safety guarantees when transitioning from unsafe to safe. The fact that you can mistakenly conceive of Rust code that somehow avoids any internal unsafety simply reinforces how obvious this simple fact is.

But using your original problem statement, if R is safe and X and Y use unsafe code but do not expose any unsafe interfaces, then either R + X + Y is safe or one of [X, Y] has a safety bug and is inaccurately marking an unsafe interface as safe.

This is a generally unsolvable problem, and every other language has this problem as well; the difference being that in most other languages you're typically forced to write the unsafe code in C (where one has much greater variety of footguns available at their disposal). If I write a Ruby FFI wrapper for buggy C code whose interfaces bleed "unsafe" (from the perspective of the Ruby VM) behavior, then I am liable to experience crashes and memory corruption bugs. The only difference here is that Rust allows you to break the seal on the warranty without switching to a different language.

> ...is obviously safely composable with any other safe Rust code precisely because it obeys safety guarantees when transitioning from unsafe to safe.

And what are those safety guarantees? This is the part where I see a lot of handwaving.

> ...either R + X + Y is safe or one of [X, Y] has a safety bug and is inaccurately marking an unsafe interface as safe.

Correct, but the problem is that we don't have a way to identify which library is incorrect without a definition for what a "safe interface" is. If R + X were unsafe or R + Y were unsafe we would have an easy answer to that question.

> This is a generally unsolvable problem...

The fact that the problem is unsolvable in general did not stop people from inventing the Rust language in the first place. The point of Rust is to solve this problem for a larger and more useful class of programs. Likewise, the research into defining what a "safe interface" is in Rust is important and useful research, e.g., RustBelt.

On a minor note, these kind of negative interactions with individual Rust community members have given me a bad impression of the Rust community as a whole.

> And what are those safety guarantees? This is the part where I see a lot of handwaving.

I think this is the contention: correct me if I'm wrong, but you're saying, that, in practice, the safety guarantees of Rust are currently too nebulous to be able to be enforced reliably, whereas most other people in this thread are, I think, visualising the "platonic Rust"/post-RustBelt Rust where the currently vague conditions for safety have been tweaked as needed and proved correct, treating the current situation more like a "just" bug (and the success of RustBelt so far hints that this isn't vapourware/imagination, there's significant concrete progress towards it).

That is to say, most people are talking about the potential of Rust's safety, whereas you're talking about the reality, right now. I think both positions are reasonable to think about, but it obviously leads to confusion when the positions aren't distinguished in a discussion. (I also think that most people would agree with you about Rust right now: there isn't a definite set of safety rules, so it can be hard to work out whether "edge-cases" are correct or not.)

> And what are those safety guarantees?

It is still that we are still working this out; this is what we're cooprating with academia on, formalizing the exact semantics. Such things take time.

> One known trap is that it is not sufficient to demonstrate that Rust code without "unsafe" blocks cannot observe unsafe behavior in your library.

I'm curious: what does this mean/could you point me to the part of the paper that describes it? (Unfortunately, I don't have time to read all 34 pages at the moment.)

p. 66:3, the two paragraphs starting with "However, there is cause for concern..."
Thanks!

I'm not convinced that the statement in the paper translates into what you said: the key piece of that paragraph is "or seems to be". The Leakpocalypse problem was one piece of code (crossbeam's scoped threads API) was relying on an invariant that doesn't actually hold ("destructors will always run"). It was, fundamentally, a bug in the `unsafe` code in crossbeam, meaning it was incorrect for crossbeam to call its API safe: the fact that it took multiple libraries to trigger in that case means nothing, it just happens to be the circumstances under which the problem was noticed.

Of course, to be fair, no-one had thought about this destructor property before, just implicitly relied on it, and so it does demonstrate the necessity for better understanding of/tools for unsafe code, which is what projects like RustBelt are pushing towards.

To summarise, I still don't see how these two sentences are different:

> no unsafe behavior is observable by clients of the library

> [clients] without "unsafe" blocks cannot observe unsafe behavior in [the] library

Indeed, I don't think it makes sense to even attempt to prove that clients with unsafe code can't observe unsafe behaviour (which seems to be the only way for the second sentence to differ from the first). The typical framing is that the safe code can be arbitrarily bad and there'll still be no undefined behaviour, but arbitrary `unsafe` can do anything, including writing directly to another library's data structures, which of course can easily cause UB (e.g. replace a Vec's data pointer with a null one).

> To summarise, I still don't see how these two sentences are different: ...

To "observe unsafe behavior" means I can write a program that does something safe, e.g., a data race or invalid memory access. It's possible to write library X and Y in such a way that I can observe unsafe behavior using both X and Y in my program, without putting "unsafe" blocks in my program. This is possible even if I can't do the same thing with either X or Y alone.

This is surprising, because it means that the naive definition of "safe interface" is not actually safe enough!

I'm still not understanding: other than the hardware/global state thing in another comment in this thread[1], what's a program that demonstrates this "composing safe interfaces is unsafe" property? The example in the paper is not one, it was a bug for crossbeam to mark its API safe.

[1]: I'm ignoring this case, because it's somewhat completely impossible to solve: there's no way Rust (or any language) can control this situation. And, there's a strong argument in my mind that this sort of scenario should have an `unsafe` constructor or something, to act as an assertion from the programmer that they're guaranteeing unique access to the resource.

The definition of what you're allowed to do with `unsafe`, however nebulous its specifics may be at the moment, is that such a situation is a bug in one or both of those libraries, not their composition.

To put it another way, if you can't observe unsafety with X or Y alone, but you can with both together, then at least one of them has given you a new capability that you did not have before. Either that new capability is not truly safe, and thus the bug is providing that capability, or it exposes the other library relying on something not truly safe, and thus the bug is relying on that property.

The important point here is that, by definition, at least one of X or Y will have to change when such a situation is discovered, in order to preserve the property that composing safe interfaces is safe.

Correct me if I’m wrong but I thing GP was stating that composing two “unsafe” blocks together (both of which are manually verified to work well) might interfere with each other when run simultaneously.
'unsafe' doesn't mean unsafe. Unsafe means "I can't convince the compiler that this is safe. But in my context, it is."

If there is any way in which a function containing an `unsafe` block may be used unsafely (specifically, violating memory-safety), then that function must also be marked as unsafe.

That's what it means if you use it properly. If you write bad code, it means "this code will break everything and the compiler won't protect you." An `unsafe` block does nothing to guarantee that you're doing something safe, which is what you seem to be saying, even if it's not what you mean to say.
This is pretty much tautological, and nobody's arguing this point. However, you have the benefit of being able to narrow your search scope to the parts of your code marked `unsafe` instead of the entire project.

Most things don't need unsafe code. For the things that do, you must yourself uphold the invariant that all requirements of safety are being obeyed when transitioning out of an unsafe block. If you don't do this, bad things can happen. Other languages don't have this because they either don't offer Rust's safety guarantees in the first place, or the only way to circumvent them is to write code in C.

> However, you have the benefit of being able to narrow your search scope to the parts of your code marked `unsafe` instead of the entire project.

I may be missing some context, but this is certainly not true in Rust. In order to understand whether an individual piece of code marked `unsafe` is actually correct, you need to examine the context in which it is run and in general you could have to examine a large section of "safe" code in order to figure out whether the "unsafe" block is correct. Usually you will have to examine the entire module.

Yes, it's limited to the module. That's still limited, and a good reason to keep such modules small.
Its possible that I misread the comment; it seemed to state that this problem extended into safe Rust, which it definitely does not.
Think of two libraries that use unsafe Rust and interact with the same hardware, but work correctly when used on their own.

A program written only in pure not-unsafe Rust might use these two libraries in a way that breaks because the assertions the programmers of the libaries had, like for example having exclusive access to the hardware, are wrong now.

One could argue the pure not-unsafe Rust program is wrong, not the libraries.

I think klodolph's comment is very thoughtful and shows a good deal of experience and domain knowledge.

There is a conflation happening here. What is the nature of this bug when you compose these two libraries together?

If it is a violation of Rust's safety guarantees, then at least one of those libraries has a bug, it is exposes a safe abstraction which is not actually safe. One could not argue that the safe Rust program is wrong; the library exposing an unsafe interface as safe is unarguably wrong.

If the library just behaves incorrectly in a manner disconnected from the type system because some global state was changed in a way it doesn't expect ("the hardware" in this case), then that's a normal bug & it is not connected to unsafe code at all.

> then at least one of those libraries has a bug

Yes, we agree about this point. However, the process for determining if these bugs exist is not well understood. That's what I mean when I say that this is not only a pedagogical problem--even Rust experts struggle to prove that a library containing "unsafe" blocks is safe, and more research into the area is needed.

My apologies if I misunderstood you - I read your comment as suggesting that safe abstractions are "leaky" and therefore create additional responsibilities for users to validate that they are using them safely when composing them together. This is not the case unless those abstractions are incorrect - which is the same situation you are with any language, just most languages those abstractions exist within the language runtime & not in libraries.