Hacker News new | ask | show | jobs
by zozbot234 439 days ago
> which may cause an invariant to break, thus potentially causing memory unsafety in whatever required that invariant

By that standard anything and everything might be tainted as "unsafe", which is precisely GP's point. Whether the unsafety should be blamed on the outside code that's allowed to create a 0-valued NonZero<…> or on the code that requires this purported invariant in the first place is ultimately a matter of judgment, that people may freely disagree about.

3 comments

EDIT: A summary of this is that it is impossible to write a sound std::Vec implementation if NonZero::new_unchecked is a safe function. This is specifically because creating a value of NonZero which is 0 is undefined behavior which is exploited by niche optimization. If you created your own `struct MyNonZero(u8)`, then you wouldn't need to mark MyNonZero::new_unchecked as unsafe because creating MyNonZero(0) is a "valid" value which doesn't trigger undefined behavior.

The issue is that this could potentially allow creating a struct whose invariants are broken in safe rust. This breaks encapsulation, which means modules which use unsafe code (like `std::vec`) have no way to stop safe code from calling them with the invariants they rely on for safety broken. Let me give an example starting with an enum definition:

  // Assume std::vec has this definition
  struct Vec<T> {
    capacity: usize,
    length:   usize,
    arena:    * T
  }
  
  enum Example {
    First {
      capacity: usize,
      length:   usize,
      arena:    usize,
      discriminator: NonZero<u8>
    },
    Second {
      vec: Vec<u8>
    }
  }
Now assume the compiler has used niche optimization so that if the byte corresponding to `discriminator` is 0, then the enum is `Example::Second`, while if the byte corresponding to `discriminator` is not 0, then the enum is `Example::First` with discriminator being equal to its given non-zero value. Furthermore, assume that `Example::First`'s `capacity`, `length`, and `arena` fields are in the in the same position as the fields of the same name for `Example::Second.vec`. If we allow `fn NonZero::new_unchecked(u8) -> NonZero<u8>` to be a safe function, we can create an invalid Vec:

  fn main() {
    let evil = NonZero::new_unchecked(0);
  
    // We write as an Example::First,
    // but this is read as an Example::Second
    // because discriminator == 0 and niche optimization
    let first = Example::First {
      capacity: 9001, length: 9001,
      arena: 0x20202020,
      discriminator: evil
    }

    if let Example::Second{ vec: bad_vec } = first {
      // If the layout of Example is as I described,
      // and no optimizations occur, we should end up in here.

      // This writes 255 to address 0x20202020
      bad_vec[0] = 255;
    }
  }
So if we allowed new_unchecked to be safe, then it would be impossible to write a sound definition of Vec.
Yeah, anything can (and should) be marked unsafe if it could lead to memory safety problems. And so if it potentially breaks an invariant which is relied on for memory safety, it should be marked unsafe (conversely, code should not rely on an unchecked, safe condition for memory safety). That's basically how it works, Rust has the concept of unsafe functions so that libraries can communicate to users about what can and can't be relied on to keep memory safety without manual checking. This requires a common definition of 'safe', but it then means there isn't any argument about where the bug is: if the invariant isn't enforced by the compiler in safe code, then other code should not rely on it. If it is, then the bug is in the unsafe code that broke the invariant.
> Whether the unsafety should be blamed on the outside code that's allowed to create a 0-valued NonZero<…> or on the code that requires this purported invariant in the first place is ultimately a matter of judgment, that people may freely disagree about.

It's not, though. NonZero<T> has an invariant that a zero value is undefined behavior. Therefore, any API which allows for the ability to create one must be unsafe. This is a very straightforward case.