Hacker News new | ask | show | jobs
by nathanrf 976 days ago
It is unsound to transmute `&'a T` into `&'static T`, but it is not UB - as long as all of the subsequent uses of the transmuted reference obey the "real" lifetime of the original reference:

    fn example<'a>(r: &'a mut i32) -> &'static mut i32 {
        unsafe { std::mem::transmute(r) }
    }
    
    
    fn main() {
        let mut x: i32 = 5;
        let ptr: &'static mut i32 = example(&mut x);
        *ptr = 6;
        println!("{x}");
    }
(because it's unsound, it's considered wrong to do this - you should not intentionally write functions whose types are lies, and this one definitely lies, so it should be marked `unsafe` - but this is not automatic UB)

https://play.rust-lang.org/?version=stable&mode=debug&editio...

You can run through Miri and confirm there's no UB even though we're modifying `ptr`, whose lifetime has been extended beyond the length of the function.

However, Rust does have extra guarantees here which make this irrelevant to the pessimization problem in the linked article - you cannot ever legally convert a `&T` into a `&mut T` - this is always UB. This means that Rust guarantees that `example` does not modify `x` (unless e.g. it contains an `UnsafeCell`, like a `Mutex`'s contents), and so it does not need to defensively reload its value.

That is to say: Rust, just like C++, makes it legal (but frowned upon) to "leak" a reference beyond the stated lifetime it's provided as. But unlike C++, it is (always!) illegal to "upgrade" a `&T` into a `&mut T`, and thus the fact that it escapes does not hinder other optimizations.

2 comments

Could you please expand on this last point? Would it not be the same in case of C++’s `const`s?
In C and C++, `const` on pointers/references is basically just a comment to programmers - it is part of the type, but doesn't "mean" anything to the abstract machine; the rules don't treat const / regular references/pointers differently, they just say that the types only let you mutate through a mutable pointer.

Obviously, good code should treat it as more than just a comment - using `const` correctly clarifies intent and makes it possible to stay sane as a C++ developer, but the abstract machine doesn't care.

In C++, you can basically always `const_cast` a `const T&` into a `T&` and then modify it without causing UB. A function that accepts a `const T&` is just pinky promising that it will be polite and probably not do that.

It is only UB if the underlying object is "actually const", and even then, it doesn't cause UB until you actually perform the mutation; creating the mutable reference itself is perfectly fine.

For example, the following is perfectly legal:

    int& upgrade_to_mut(const int& x) {
      return *const_cast<int*>(&x);
    }

    int x = 5;
    const int& x_ref = x;
    int& x_ref_mut = upgrade_to_mut(x_ref);
    x_ref_mut = 6;

it's only invalid if the object that is pointed at is const, as in:

    int& upgrade_to_mut(const int& x) {
      return *const_cast<int*>(&x);
    }

    const int y = 5;
    const int& y_ref = y;

    int& y_ref_mut = upgrade_to_mut(y_ref); // it is actually legal to produce y_ref_mut, but we cannot modify it


    y_ref_mut = 6; // this is UB: cannot modify a const object 'y'
The difference is that in Rust, "mutation capabilities" are part of references, and so you cannot create them out of nowhere, that would be UB. But in C++, mutation capabilities are part of the object being pointed at, so as long as they happen to be there when you perform the mutation (e.g. you're not modifying a string literal or a variable declared `const`) then there's no problem.
It's not entirely true that "const" is "just a comment," depending on the use case. In machines with super-limited RAM you can use const on globals to tell the compiler "put this in .rodata"

In other words, "const" (in a global context) can tell the compiler "you don't have to copy this to RAM, just read it directly from non-volatile storage." Obviously, that would be undesirable on a desktop computer, but if you're dealing with a wee little microcontroller, it's very helpful.

`const`-as-comment is specifically limited to pointers and references - `const` on objects definitely does change semantics (it is always UB to attempt to modify a `const` object).

Another good example is string literals (except when initializing a non-const `char[]` variable), which are often allocated in read-only data in the same way, since they are const objects too.

The comment specifically referred to references and pointers whereas your example does not.
I don't agree it is just a comment to programmer. CppReference says,

> Modifying a const object through a non-const access path and referring to a volatile object through a non-volatile glvalue results in undefined behavior.

and both compiler have tried to take advantage of this in the past.

That is saying exactly the same thing that I said; the qualification of the pointer is irrelevant; you get UB when modifying a const object. In the code snippets above,

   int x; // not a const object, so can be freely modified
   const int y; // is a const object, so cannot be modified
C/C++ pointer provenance doesn't include information about constness, so it doesn't matter "how" you got a mutable pointer to `x`: you're always allowed to modify `x` through that pointer, even if the pointer "came from" a const reference.

The reason for mentioning a "non-const access path" is that the type system forbids you from modifying through a const access path in the first place, so the program would already be rejected if you tried that.

I'm not saying it's a good idea to go around dropping const qualifications; `const_cast` is mostly evil and should be avoided. But at the level of the abstract machine, it's a no-op, even when going from const -> non-const, other than changing the type of the provided pointer.

The benefit of `const` is that if you don't use C-style casts that discard constness, and you don't use `const_cast`, and you don't use `mutable` or other type-unsafe or const-unsafe features, it's not possible to accidentally obtain a non-const pointer to a const object. Thus C++ actually helps you avoid this UB pretty well. But the fact that general conversion from const to non-const is permitted reduces the kinds of optimizations that can be performed.

Thanks! Do you know if the Rust compiler is able to pass this information to LLVM somehow?
It does pass this information to LLVM, in the form of the `readonly` attribute. This seems to be a bug in LLVM that does not optimize the function propely, I don't know why.