Hacker News new | ask | show | jobs
Unsafe Zig Is Safer Than Unsafe Rust (andrewkelley.me)
112 points by jfo 3075 days ago
15 comments

Rust is a language that offers you lots of compile time checks, and an escape hatch called unsafe that says “trust the programmer here.” Yes, it is possible—and easy—to make mistakes in the place where you have asked to be trusted, not checked.

We have a big pedagogical task ahead of us in teaching safe practices for unsafe Rust, and defensive coding practices in unsafe Rust.

We should also think of if we can improve unsafe Rust to be harder to misuse. There are improvements coming in compile time evaluation, and those can potentially make the compiler much stronger when it comes to detecting memory errors in unsafe code at compile time.

There's not only a pedagogical task here, but the Rust community must learn how to write code safely. The major difficulty here is that in general, unsafe pieces of code cannot be safely composed, even if the unsafe pieces of code are individually safe. This allows you to bypass runtime safety checks without unsafe code just by composing "safe" modules that internally use unsafe code in their implementation.

This kind of problem comes up a lot. Composed atomic operations are not atomic. Composed correct threaded code is not always correct. Mixing Scheme control structures made with call/cc don't work as desired. Enabling different Haskell language extensions gets you off the deep end quickly, and some unsafe combinations are surprising (see GeneralizedNewtypeDeriving, which is considered unsafe even though it used to be safe).

> The major difficulty here is that in general, unsafe pieces of code cannot be safely composed, even if the unsafe pieces of code are individually safe. This allows you to bypass runtime safety checks without unsafe code just by composing "safe" modules that internally use unsafe code in their implementation.

This comment suggests you don't have much domain knowledge about how `unsafe` in Rust works, so I'm surprised you speak with such confidence. Your comment is flatly wrong: users using only safe code are not responsibility for guaranteeing the composed safety of the components they use (whether or not they are implemented with unsafe code).

Interfaces marked safe must uphold Rust's safety guarantees, or they are incorrect. They are just wrong if they have additional untyped invariants that need to be maintained to guarantee their safety; interfaces like this must be marked `unsafe`.

Because they cannot depend on untyped invariants, any correct implementation with a safe interface can be composed with any other. This ability to create safe abstractions over unsafe code which extend the reasoning ability of the type system is a fundamental value proposition of Rust.

> This comment suggests you don't have much domain knowledge about how `unsafe` in Rust works, so I'm surprised you speak with such confidence.

I hate being tone police, but jeez, we're having a discussion about Rust here and talking about my personal competency is inappropriate and unwelcome.

The problem I'm talking about happens when you write libraries that contain "unsafe" blocks. You want to prove (or at least assure yourself) that no unsafe behavior is observable by clients of the library. However, the way to do this is not entirely clear, although there is research being done in this area. One known trap is that it is not sufficient to demonstrate that Rust code without "unsafe" blocks cannot observe unsafe behavior in your library.

See: https://plv.mpi-sws.org/rustbelt/popl18/paper.pdf

These concerns are not hypothetical, there have been soundness problems in the Rust standard library before and I expect it to happen again.

Proving the correctness of unsafe code is totally different from what you talked about, which was composing different abstractions with unsafe internals together.

Users of safe Rust do not need to worry about whether the composition of two safe interfaces that use unsafe internally is safe unless one of those interfaces is incorrect. Your comment would suggest that users need to think about the untyped invariants of each library they use, but this is not correct, libraries are not allowed to rely on untyped invariants for the correctness of their safe APIs.

The problem with talking about this subject is that "safe" and "unsafe" are overloaded terms in Rust, so I can understand why you think I was talking about something different.

Let R be arbitrary Rust code with no "unsafe" blocks. Let X and Y be libraries with "unsafe" blocks. You can prove that R + X is safe, and prove that R + Y is safe, but you haven't yet proven R + X + Y is safe. This is the hard part, because without an understanding of what property of X and Y individually makes R + X + Y + Z + ... safe, we don't have a good definition for what makes an interface "safe".

And this is what I mean when I say that this is not only a pedagogical problem.

> One known trap is that it is not sufficient to demonstrate that Rust code without "unsafe" blocks cannot observe unsafe behavior in your library.

I'm curious: what does this mean/could you point me to the part of the paper that describes it? (Unfortunately, I don't have time to read all 34 pages at the moment.)

p. 66:3, the two paragraphs starting with "However, there is cause for concern..."
Correct me if I’m wrong but I thing GP was stating that composing two “unsafe” blocks together (both of which are manually verified to work well) might interfere with each other when run simultaneously.
'unsafe' doesn't mean unsafe. Unsafe means "I can't convince the compiler that this is safe. But in my context, it is."

If there is any way in which a function containing an `unsafe` block may be used unsafely (specifically, violating memory-safety), then that function must also be marked as unsafe.

That's what it means if you use it properly. If you write bad code, it means "this code will break everything and the compiler won't protect you." An `unsafe` block does nothing to guarantee that you're doing something safe, which is what you seem to be saying, even if it's not what you mean to say.
Its possible that I misread the comment; it seemed to state that this problem extended into safe Rust, which it definitely does not.
Think of two libraries that use unsafe Rust and interact with the same hardware, but work correctly when used on their own.

A program written only in pure not-unsafe Rust might use these two libraries in a way that breaks because the assertions the programmers of the libaries had, like for example having exclusive access to the hardware, are wrong now.

One could argue the pure not-unsafe Rust program is wrong, not the libraries.

I think klodolph's comment is very thoughtful and shows a good deal of experience and domain knowledge.

> see GeneralizedNewtypeDeriving, which is considered unsafe even though it used to be safe

This is wrong. GND plus TypeFamilies or some other extension in that vein used to be unsound when combined. It has since been fixed via the introduction of type roles.

> Composed atomic operations are not atomic.

Incidentally, Haskell also has this figured out via the STM monad.

I hadn't heard about the advancement with Roles, but it seems that GND is still prohibited in "Safe Haskell"?
You’re probably reading an old document - this restriction was removed with the introduction of roles. https://downloads.haskell.org/~ghc/7.8.4/docs/html/users_gui...

Roles were introduced in 7.8.something, and GND was added to Safe.

Rust's approach to "unsafe" is to let the programmer do whatever they want. Having to use this for UNIX-type API calls is kind of lame.

I once proposed extending C to allow talking about array sizes.[1] You'd define "read" as

    int read(int fd, char &buf[len], size_t len);
The compiler now knows that "buf" is an array with length "len", and can check calls for "buf" being the right size. The generated code for the call is the same; this doesn't require array descriptors. It just says which parameter defines the length of the array.

All the original UNIX calls and most of the Linux ones fit into that simple model. If the size of something is hard to define simply at an API call, the API has a problem.

Rust's system for external C calls should be more like that and less about casts to raw pointers. It's technically possible to fix this in C, and have a "strict mode", but the political problems are too hard.

[1] http://www.animats.com/papers/languages/safearraysforc43.pdf

> Rust's system for external C calls should be more like that and less about casts to raw pointers.

It seems a rosy-eyed view to think that this would helping safety significantly, and would require a lot of effort: it's likely to be much lower pay-off than other things, like investing in, say, sanitizers or even just doing the work of writing safe wrappers for popular C libs, removing C FFI concerns from most people, who can just use the Rust library.

Specifically, as you say, C doesn't have this information, meaning there's no way for Rust's (or another language's) FFI to work like this automatically. Instead, someone will have to annotate the C code, have some extra "notes" layer, or annotate the imported Rust declarations. Either way, there's a human element, meaning a place for mistakes to be made. It seems like the less-duplicative way to do this is to make Rust wrappers that take Rust slices, since these will be wanted in the end anyway.

Of course you want to use Rust slices. Those map directly to the kind of C array I outlined. If you could declare a C API that way to Rust, you'd get the mapping without talking about pointers explicitly at all.

What I'm arguing for is a declarative way to talk about C interfaces that is consistent with Rust's model. This is better than using "unsafe" to construct C-type raw pointers. Yes, this is more restrictive and there will be some awful C APIs you can't describe. That's a good indication said C API is trouble.

What would make this "declarative way to talk about C interfaces" less error prone than something like this?

    extern fn read(fd: c_int, buf: *mut c_char, len: usize) -> isize;

    pub fn read(fd: c_int, buf: &mut [c_char]) -> isize {
        unsafe { read(fd, buf.as_mut(), buf.len()) }
    }
Further, note that this is insufficient for an idiomatic Rust API. You would also want to wrap the file descriptor (perhaps not for all C APIs) and the return value (definitely applies to all C APIs). So it would really look more like this:

    pub struct File { fd: c_int }

    impl File {
        pub fn read(&self, buf: &mut [u8]) -> Result<usize, ReadError> {
            let r = unsafe { read(self.fd, buf.as_mut(), buf.len()) };
            if r == -1 {
                Err(ReadError::from(errno))
            } else {
                Ok(r as usize)
            }
        }
    }
I can certainly imagine a way to do that declaratively, but not in a way that helps even this most basic of examples. (Also, note that constructing raw pointers is completely safe- `as_mut` for example.)
That's not bad. It would be useful to be able to use some kind of "C slice" in an extern fn declaration, so you could talk about arrays, rather than pointers. Same function call code, but more Rust-line syntax. Then you don't need unsafe imperative code at all.

This would put all the memory-risky stuff in declarations of external functions.

> I once proposed extending C to allow talking about array sizes.[1]

That would be a very useful, and relatively unobstrusive, extension to C. I've always liked the idea of a C "strict mode". I wish the political problems weren't so hard.

That's for local variables. Microsoft and Linus Torvalds didn't like it, because it's a way to suddenly cause unexpected stack growth of arbitrary size. That feature was made optional in C++11, and Microsoft never implemented it.
FWIW Microsoft does have SAL annotations to do the same thing. For example fread's prototype is

    size_t fread(
        _Out_writes_bytes_(_ElementSize*_Count) void * _DstBuf,
        _In_ size_t _ElementSize,
        _In_ size_t _Count,
        _Inout_ FILE * _File
    );
https://docs.microsoft.com/en-us/visualstudio/code-quality/a...
C++ compilers also have references to arrays which can be abused in some cases:

    template < size_t len > int read(int fd, char (&buf)[len]); // array size will be infered
    int read(int fd, char (&buf)[1024]); // array size must be exactly 1024
C++ largely suffers from the same problems. Often a C++ programmer can write code which relies on iterators and containers which is quite safe and difficult to mess up, while for a variety of highly-specialized applications, mixtures of packed structs, pointer arithmetic, and arbitary sequences of binary data need to be handled with utmost care.

Knowing when to use which set of tools and how to safely glue them together is important.

Now, I will say that the C++ community has been teaching safer, cleaner practices for years now and users seem to be largely adopting them. It works, as long as the developers don't pay a runtime or excessive development cost to do so.

[I'm sure a crustangelist is likely to come tell me that I can never write safe C++ code and that the universe will hate me for eternity for not leaping to rust, but please, understand that I don't suffer from unsafe memory issues on the whole because modern C++ is quite safe. You won't convert me, but I'm also not trying to convert you.]

Container and iterator code is not safe at all since there is no bounds checking by default and no protection against iterator invalidation, which can both cause writes to memory outside the intended object and thus a catastrophic outcome.

There is no safe subset of C/C++ unless you just don't use pointers or references at all (and refrain from using any library that is not safe which includes large parts of the standard library like all the containers), or you write it in Rust or an equivalent language with lifetimes and linear types and automatically translate it to C/C++ somehow.

> unless you just don't use pointers or references at all (and refrain from using any library that is not safe which includes large parts of the standard library like all the containers)

It may seem far fetched, but it might be more practical than you'd think. The SaferCPlusPlus[1] library provides memory-safe implementations of the most commonly used standard library containers, and pointer types that reflect the lifetimes of their target objects. That is to say, there is a practical subset of C++ that is more closely comparable to safe Rust than is conventional C++.

[1] shameless plug: https://github.com/duneroadrunner/SaferCPlusPlus

To be fair, I believe you and the individual you are replying to are treating the word 'safe' differently. Correct; C++ doesn't have a built-in concept of "safe" that is compiler guaranteed and anything written in that, if it isn't written defensively at literally every line of code, falls on the library consumer to handle that.

Rust, CLR/JVM/interpreted languages are 'safe' because the compiler will flat out refuse to do things that are unsafe (with exception to Rust and some non-interpreted languages allowing you to declare portions of code with as 'unsafe'/'hold my beer'). Short of bugs in compiler/standard library, or unsafe code from libraries written in 'unsafe' languages that are consumed by safe languages (which usually requires a bug in the library, not a bug with how the library is called in the "safe" context, but not always), C++ is 'not safe at all' by comparison. I think if you swap the word 'safe', with 'reliable', that was what the individual you were replying to was getting at. 'Safe' in this context is: "The compiler put the foot-shooting-gun in a safe", vs. 'reliable' is "the gun is in my hand, has no safety, and a somewhat light trigger but it's aimed at the target, not my foot ... as far as I know".

You can handle pointers and references safely as well as use components of the standard library that don't do bounds (or a lot of other, "perfectly reasonable but missing for performance/philosophical reasons") checks, but it's up to you.

A really terrible analogy: it's illegal to drive a car where I live with either of the front passengers lacking a safety belt. Heck, you can't even build a car without a number of safety features that regulation requires. It's also got a number of features to help you avoid accidents. If you or someone screws up on the road, you're protected by the safety features and your mastering of driving. That's the 'safe' programming languages that most people use these days. C++/C is like my motorcycle. The only safety features it comes with rely entirely on my skill at not only "not making mistakes" but anticipating the mistakes of others -- I've had several close calls but have been able to maneuver around other distracted drivers/library maintainers, but if I'm not paying attention to everyone/everything around me I'm toast. And even then, some accidents are unavoidable that would have been survivable with a steel cage and a safety-belt[0].

[0] But damn, that bike is fast, and unlike C/C++, it's a lot more fun to use than the safer alternatives.

Reliable is a better word. C++ has lots of features to help you avoid accidents. It's just that you know that certain operations take certain levels of precautions. I like C++. I like the expressiveness it provides. Yes, some things are unsafe, but the amount of time I've spent finding segfaults or other memory errors since becoming proficient is an epsilon in relation to the amount of time I've spent getting all of my crazy template magic to fit into the right spots.
I'll be an anti-crustangelist and say that I've actually avoided moving a C++ project I've been working on over to Rust because (1) I'm finding that fixing some of the previous code's pre-C++xx practices is suitable enough and (2) I've only written a few small things in Rust up to this point and learning the 30% or so more that I'd need to in order to get things fixed would take more time and has more unknowns. Granted, it's not a large application, the code is easy to follow in its current state, and it wasn't "already a mess" when I started working on it. Fixing its problems has required careful review of the code-base but it was certainly possible and practical to correct the issues with this (small) application; even by someone as weak in C++ as I am[0].

C++ has improved quite a bit, from my perspective, anyway.[1] That said, I'm excited about Rust and have started (shallowly) exploring it. I like what I see, so far; particularly with improvements on the ergonomics of the language. Seeing it put to use in major projects (cough Firefox) successfully and reading about the problems it solved for Mozilla is the main reason I've set a goal to become proficient in it this year. It's a tall order to commit to a new language, particularly when the other languages I write in generally do everything I need them to. There's a small number of things, though, that still pull me toward C++, and I'd rather have an alternative.

As pleasantly surprised as I was with C++, I had plenty of four-letter-word-riddled moments. Practically all of it stemmed from old libraries, or legacy pieces/parts with my favorite being "lets look at the documentation to see what kind of string this method expects/returns". Character encoding, character byte-sizes, differences between byte-length and semantic length are all complexities when dealing with strings -- many of which get hidden away by CLRs or JVMs or script interpreters. And I'm sure there's some reasons that a person with moderate C++ knowledge could tell me as to why so many of the recently developed (proprietary) libraries seemed to love to pass pointers to non-unicode character arrays around (performance? comfort? nationalist? satan worship?), but it was a punch in the face when I knew an "easy" std::string was right there and never needed to be a character array/serve as a buffer/do anything but be a unicode string for a brief moment of existence. And if I have to figure out why Hunter failed to download the boost library because someone statically linked it to cURL without https support, or used the built-in implementation and compiled it with the wrong flags, or for whatever reason, the downloaded version fails the SHA1 check Every. Single. Time. ... well, no need to conclude that one.

Heck, I'd argue crates is a C++ killing feature for me. Yes, Hunter can be made to work (kicking and screaming, sometimes) with cmake, which I'm told can also be made to work. Microsoft has one, too (I can't remember its name and I know they were working on making it possible to just "use NuGet"[3], but I've always felt that a lack of easy dependency retrieval and management caused three problems (1) people use old libraries that are very likely to be present on the target build host, (2) people write their own (poor, naive) implementations for Solved Problems(tm) or (3) the miserable fck doesn't build, there's not enough documentation to figure out in blue-blazes <qwertyuio.h> is, who wrote it and where it came from and when you do* finally find it, it won't build because it's missing its dependencies, so pick (1) or (2) or give up. Compared against '(package-manager) install (package)' and hey, I'm writing code like I originally set out to!

Wow, this devolved pretty quickly into a rant. My apologies for that -- it really isn't as bad as I've made it sound and I realize that most/all of these are my problems and I'm not knocking a language (or folks who program in it) for not bending to my will and having every feature I want, but I'm hopeful for what's coming around with Rust, D and others that are tackling the systems programming space. This Zig article caused me to read several others, as well. The compile-time variables as a workaround for lack-of-macros[3] looks like an interesting idea -- I'm not sure if the syntax is clear enough (globals are implicitly compile-time) but since it's a somewhat unfamiliar syntax, I lack experience to speak intelligently on that.

[0] My adventure started with troubleshooting a very consistent memory leak that was generally caused by some code-in-a-loop that failed to delete things. Often the solution was to change code to use something from boost (which it took a hard dependency on, anyway) or wrapping it in a class and RAIIing my way to a better reality. (and can we get a new acronym? I always write RIAA and if I don't write it, I see it and hairs on my neck stand up)

[1] I "gave up" C++ development around 2001 and short of reading code on rare occasion, didn't seriously start working in the language again until a couple of years ago. I felt like I was writing in a different language -- not sure if that was perception having been away from it for so long, or if it really was that different -- it took a lot of reading to get to a point where I was comfortable breathing in the direction of the code I was playing with.

[2] And NuGet could be a good option, here, especially if they move away from its roots of being somewhat of "it's really just a powershell script with a kludgy metadata file" since I'd rather not add yet another shell to my non-Windows hosts that already have alternatives that I prefer. Last I looked -- .Net Standard pre-2.0, they were fixing the metadata problems -- and maybe they weren't all that bad to begin with considering I can't think of the last time NuGet got in my way on a .Net app.

[3] Though, ideally, I want both.

edit: fix some bad footnote pointers - sheesh, can't even write a comment without a segfault

Well, if you want to write a tree using indices in a local array instead of pointers for better locality and memory footprint (which is ideal for many situations), you run into the difference between language pointers and computer science pointers. That's not even something that Rust will be smart enough to help you do properly.
In fact (unless something has changed dramatically without my getting the memo), that's how you would write many data structures in Rust---either for better memory behavior or you have cycles and Rc won't cut it. You have to use a vector of nodes and indices as pointers; if you try using references, the borrow checker comes and kicks sand in your face.
It's doable to keep using pointers instead of indices- they just all have the lifetime of the vector and you can freely follow them around.

This does prevent resizing the vector, but you can get around that by using a different arena that allocates in chunks rather than reallocating (and thus doesn't require a unique reference for .insert).

Rust can totally help you do that properly. You can wrap the indices in a struct parameterized by a lifetime and regain all the same tools you would have with language pointers.
I would suspect that this particular issue is not necessarily a defensive coding problem.

https://gist.github.com/andrewrk/182ace5dee6c4025d8c4b0ca22c...

https://github.com/andrewrk/libsoundio/blob/fc96baf8130b52ba...

I've written that code before, and I know better (but then, all the world's an x86 box, right?) But first, I'm not sure how to make that code not broken (yes, that's an education issue), and second, the same arguments can be made about all the issues Rust is designed to prevent.

This really should be a compiler warning.

It will be interesting to watch the proportion of safe/unsafe code in large Rust codebases over time.
At a guess, it will increase until the necessary-but-not-currently-handled constructs are dealt with (arena-based memory management, I'm looking at you) and then decrease asymptotically. Already in Rust, if you adopt C++ STL idioms (and don't want to squeeze more performance out) and don't need to visit currently-unwrapped interfaces, you won't need unsafe at all.

Rust is a very good C++ replacement.

To put a point on this question: should an unsafe language (or language subset) be as safe as possible, or as unsafe (i.e. powerful) as possible?
Unsafe Rust is a superset, not a subset, incidentally.
It seems to me that "unsafe" Rust is a subset of Rust as a whole. Unless the Rust language does not support unsafe code at all?
"Unsafe" Rust is a superset because everything you can do in normal "safe" Rust, you can do within an "unsafe" block. That is, being within an "unsafe" block (which is what people mean by "Unsafe Rust") allows you to do more, not less.
> It seems to me that "unsafe" Rust is a subset of Rust as a whole.

Sure. When people say "Rust" they usually mean "safe Rust". But if we consider "Rust" as a whole, "Safe Rust", and "Unsafe Rust", then:

Rust is Unsafe Rust

Safe Rust is a subset of Unsafe Rust (and therefore Rust).

I think the Rust is not how you should write such a code. Why not start with the struct, and cast to a void* or a char* when C code requires it? I.e., the buggy example becomes:

  #[derive(Copy, Clone, Debug)]
  #[repr(C)]
  struct Foo {
      a: i32,
      b: i32,
  }

  fn main() {
      let mut array = [Foo { a: 0x01010101i32, b: 0x01010101i32 }; 256];
      let foo = &mut array[0];
      foo.a += 1;
  }
The unsafe section isn't even required, and the effect is the same. And I don't think this violates the spirit of his example, either. Consider the author's first link to a real-world occurrence of this:

            let size = mem::size_of::<FILE_NAME_INFO>();
            let mut name_info_bytes = vec![0u8; size + MAX_PATH];
            let res = GetFileInformationByHandleEx(handle,
                                                FileNameInfo,
                                                &mut *name_info_bytes as *mut _ as *mut c_void,
                                                name_info_bytes.len() as u32);
This is again, IMO, the wrong way to do this. You should just cast a pointer to an instance of the FILE_NAME_INFO struct into a c_void; the structure will need to use #[repr(C)] and the code will still be unsafe due to the C FFI, but it will be correct (and a lot simpler). This is the same thing that you would do in C, were you to call this function:

  FILE_NAME_INFO file_name_info;
  GetFileInformationByHandleEx(
      handle,
      FileNameInfo,
      &file_name_info,
      sizeof(file_name_info),
  )
just in Rust.
While the approach you suggest usually works well, it doesn't in this case: FILE_NAME_INFO[1] uses a "flexible array member"[2] (although not the C99 version of it), of requiring a dynamically sized character array in the struct's allocation, and writing directly to the memory after a struct instance. The 'WCHAR FileName[1];' field at the end of the struct is just a placeholder to allow easy access to that character array, the length 1 is a lie.

[1]: https://msdn.microsoft.com/en-us/library/windows/desktop/aa3...

[2]: https://en.wikipedia.org/wiki/Flexible_array_member

Ugh. You're absolutely right. I never liked those even in C.

So, it seems like this is relative easy to do on the stack, which is how the example does it presently. See the link below to my attempt; the stack allocation is still all safe code, still a single line. However, I presume that one will want to also create one on the heap, especially since in the example the author poses it would be a rather large stack allocation, and one might — quite reasonably — put that on the heap.

My attempt is here: https://play.rust-lang.org/?gist=1c50b35941506316372da860cae...

Couldn't avoid the unsafe for that, but, I was able to get rid of the transmute call, and transmute is a function where the warning on the tin is "this function is not just unsafe, it is radioactive". But the amount of code required still felt a bit lacking.

It seems these are an area of active work[1][2] currently.

I think there is still definitely a valid point that the author is hitting — that encoding more information into the program can allow the compiler to catch more classes of errors. (This is, after all, the very logic that gave us Rust.)

[1]: https://github.com/rust-lang/rfcs/pull/1909

[2]: https://github.com/rust-lang/rust/issues/18806

The function in question takes a pointer to a variable sized buffer that only starts with a struct. So your alternative won't work (the declared size of the struct only has room for a single character of filename). And there are certainly instances of this pattern where you really need to choose the size of the buffer at run time.
Transmute is like, the most unsafe thing possible. It basically checks if the two things have the same size, and that's it. You're responsible for everything else.

See all the warnings and suggested other ways to accomplish things with https://doc.rust-lang.org/stable/std/mem/fn.transmute.html

This is UB becuase `Foo` is not `#[repr(C)]`, in my understanding. I haven't checked if it works if you add the repr though. I don't think I'd expect it to.

> Transmute is like, the most unsafe thing possible.

Yes, the first rule of auditing Rust unsafe blocks is that if you see someone using std::mem::transmute, you walk over and ask the author if they're really certain what they're doing. :) However, it should be noted that std::mem::transmute still has some guard rails; the real "most unsafe thing possible" is the variant of this function that does away with those guard rails: std::mem::transmute_copy.

Required reading: https://doc.rust-lang.org/nightly/nomicon/transmutes.html

I changed it to:

        let foo = &mut array[0] as *mut u8 as *mut Foo;
        (*foo).a += 1;
and the IR has the same undefined behavior: https://godbolt.org/g/5Bv3FL
Yeah I mean, to be clear, it's cool zig checks this stuff. Unsafe code is extremely dangerous, in a variety of ways.

Luckily, outside of FFI, it's very rare to actually need to write it, though that does of course depend on what exactly you're doing.

We hope, in the future, to basically have tooling here that can detect when you do something UB, and warn you. As we're still sorting out the memory model, etc, it's not here yet, but it's certainly on the agenda.

Wouldn't you need to make `Foo` a union for this to be defined anyway?
Rust `#[repr(C)]` is about representation in memory and function call ABI for the type. C rules about casting through union are not relevant to Rust code.
Clearly both Rust and Zig tackle tough problems and implement solutions that will have trade offs. I don't think the top answer to a post talking about Zig's advantages should defensively try to point out how things could be different in Rust - if only you knew exactly what to do - instead it would be nice to see more discussion about other areas where Rust is perhaps better suited than Zig. For instance, you Rust clearly handles memory/pointers better (?), while maybe Zig is easier to learn?
I think both "as" and using "transmute" for non-exceptional circumstances are mistakes in Rust.

There should instead be a bunch of type-specific cast operators that can check things like alignment and that what you intended to be a zero-extending integer cast is not in fact truncating to a smaller integer type, and so on.

It's not too late to deprecate "as" and discourage using "transmute" in favor of those.

This isn't about transmute or having a specific operator that checks alignment. The point is that the alignment is part of the type is zig and, to a lesser degree, it's about having the comptime machinery for zig to decide, when you offset a &align(4) u8 by an expression, whether the result should have type &align(1) u8, &align(2) u8, or &align(4) u8.
Exactly what I was thinking while reading the OP. It seems like it'd be possible to add alignment checks either manually via cast-operators or automatically via the compiler. Rust could at a minimum display a warning "possible alignment errors" when emitting that kind of LLVM IR.
The x86 ABI enforces alignment of the stack to 16 bytes. Isn't that enough to make this particular problem go away?
No. Nothing guarantees that the array is aligned within the stack frame, even if the stack frame is aligned. What if the compiler introduced a boolean flag (for instance, a drop flag) immediately before the array, in the same stack frame?
Good point, here. As is often said, when the documentation says "undefined behavior", it means the compiler can do whatever it wants, including "work just fine"; and sometimes it'll cause time travel[0]. Hence the "nasal demons" lore. Often, it'll cause optimizations to be applied that would have otherwise been avoided resulting in a bug that appears to occur somewhere else and a programmer to look at the result of execution and ... if it actually continues executing ... swear a lot. These are especially fun because the problem frequently won't appear in debug builds.

[0] https://blogs.msdn.microsoft.com/oldnewthing/20140627-00/?p=... - worth a read for some entertainment - basically what happens when the compiler assumes "undefined behavior" can't happen and optimizes accordingly.

Zig looks very interesting. There is only TODO in memory section in documentation. From what I understand there is only manual memory management? I've seen there is a mention about custom allocators, any details? Any RAII like concept? or full manual memory management?
"Zig does not support RAII or operator overloading because both make it very difficult to tell where function calls happen just by looking at a function body."

more in the 0.1.1 release notes! http://ziglang.org/download/0.1.1/release-notes.html

"Zig's standard library is still very young, but the goal is for every feature that uses an allocator to accept an allocator at runtime, or possibly at either compile time or runtime."

more in this wiki! https://github.com/zig-lang/zig/wiki/Why-Zig-When-There-is-A...

>"Zig does not support RAII or operator overloading because both make it very difficult to tell where function calls happen just by looking at a function body."

How about showing an error if you don't call the deconstructor manually?

Memory is manually managed, yes. We do have defer (as in go) for slightly easier resource management.

Zig doesn't have a default memory allocator. Allocators instead are expected to be passed as an argument to functions as they need them. This makes it trivial to replace an allocator with something custom or use multiple different allocators within a small code block.

A contrived example:

  const std = @import("std");

  pub fn GiveMeAnInt(alloc: &std.mem.Allocator) -> %&u32 {
      return alloc.create(u32);
  }

  test "using two allocators" {
      const int1 = try GiveMeAnInt(std.heap.c_allocator);
      *int1 = 2;
      // Would usually store the allocator with the type on construction.
      defer std.heap.c_allocator.destroy(int1);

      const int2 = try GiveMeAnInt(std.debug.global_allocator);
      *int2 = 2;
  }
I'd like to see a C equivalent of this, just for comparisons sake to zig

  #include <stdint.h>
  #include <string.h>
	
  typedef struct {
      int32_t a;
      int32_t b;
  } Foo;
	
  int main(void)
  {
      uint8_t array[1024];
      memset(array, 1, sizeof(array));
	
      Foo *foo = (Foo*)(&array[0]);
      foo->a += 1;
  }

Using clang 3.8.0-2. Compiling examples with `clang -S llvm-ir`.

It appears that the array is aligned with the minimum ABI requirement 16 by default? May be a note of this in the standard, can't recall of the top of my head.

  %array = alloca [1024 x i8], align 16
  ...
  %6 = load i32, i32* %5, align 4
  ...
  store i32 %7, i32* %5, align 4


We can also explicitly specify the alignment required in C11.

  #include <stdalign.h>
  #include <stdint.h>
  #include <string.h>

  typedef struct {
      int32_t a;
      int32_t b;
  } Foo;

  int main(void)
  {
      uint8_t alignas(alignof(Foo)) array[1024];
      memset(array, 1, sizeof(array));

      Foo *foo = (Foo*)(&array[0]);
      foo->a += 1;
  }
Results in the following IR.

  %array = alloca [1024 x i8], align 4
  ...
  %6 = load i32, i32* %5, align 4
  ...
  store i32 %7, i32* %5, align 4
On 64bit Linux, stack frames are always aligned at 16 byte boundaries. The first 8 bytes of the frame contains the return address then there are 8 bytes of padding and then comes the stack allocations.

I think the example is poorly constructed, because it is inconceivable that the address to the start of an array would not be aligned sizeof(int*) bytes.

The example is illustrative enough: all the array needs to be misaligned in practice is a small value on the stack near it, e.g. if the Rust code has `let x: u8 = 1;` inserted after the array (or, I imagine, `uint8_t x = 1;` in the C, etc.), then the array's address is odd.
Why would it be? I tried it with msvc and it always manages to put arrays in stack frames at 8 byte boundaries. I can't see why a compiler would not do that.
Also curious, would one ever do this in C? I understand the need in Rust and Zig, but in C is this usual?
I’m finding Zig easier to learn and hold in my head so that also helps me right correct code and safe code. Zig is pretty much one man’s work and is very impressive. I’m still playing with Rust but I am using Zig as my C replacement right now.
One big glaring flaw IMO is that it is not really possible to just turn off certain checks as opposed to turning them all off. For instance, maybe I need to call an unsafe C api or something but could still use the borrow checker.
An `unsafe` block only enables extra features, it doesn't change existing behaviour of safe Rust. Specifically, it allows calling `unsafe` functions (FFI and pure Rust `unsafe` ones), dereferencing raw pointers and some minor other stuff (e.g., inline assembly, some manipulations of packed structs). The borrow checker still works on references, the trait system still enforces Send/Sync for concurrency, and the type system still requires things to have matching types.

It's definitely true that having a one dimensional `unsafe` might seem unnecessarily powerful in some cases (e.g. an particular unsafe block might just need to do some pointer offsetting and dereferencing, but no FFI), but it isn't a "you're on your own" hammer.

The Rust book clearly states:

"It’s important to understand that unsafe doesn’t turn off the borrow checker or disable any other of Rust’s safety checks" [1]

"unsafe" unlocks only 4 things: Dereferencing a raw pointer, Calling an unsafe function or method, Accessing or modifying a mutable static variable, Implementing an unsafe trait.

[1] https://doc.rust-lang.org/book/second-edition/ch19-01-unsafe...

In Rust, the borrow checker is still enabled in unsafe blocks.
Had never heard of zig. Does it also provide memory safety without a GC like Rust?
No, not to the same extent.

It attempts to make C-style memory management as safe as possible, and also make it easy to use different memory allocators, but does not attempt advanced techniques like borrow checkers.

There's also a pretty good metaprogramming system, so it may be possible to implement some smart memory-management libraries.

Zig is about simplicity. It's a C (and partly C++) replacement, not a Rust replacement.

Think of it this way: I could easily imagine a TCC-like, dirt-simple, super-fast compiler for Zig. I'm not sure we'll ever see the same for Rust.

That's nothing against Rust, just saying they have very different goals.

"Unsafe Zig" encompasses all of Zig, much like e.g. C.
I find it so funny people are so fixed on bounds checking. A minimal run time environment is good. It's easier to port and runs faster. Further, there are more issues than bounds checking.

Also a big part of it is companies don't really pay for quality software. They just care about software that works mostly made to cost. I don't see rust reducing this cost much except. First, one still has to interact with hardware, that does not fit rust's/zig's/(insert safe language) run time model. Secondly, soon as you start interacting with software out side of that model same issues apply.

> I find it so funny people are so fixed on bounds checking. A minimal run time environment is good. It's easier to port and runs faster. Further, there are more issues than bounds checking.

Bounds checking on arrays is a compile-time check in Zig. Other forms of bounds-checking can be disabled in release-mode.

I don't see a single compelling reason why you wouldn't at least want bounds checking in debug mode. If you're out of bounds, something is wrong, and it's always better to get an early and precise error about it.

In Zig you can take slices of arrays or pointers, which contain a pointer and a length. This is not just about safety, it's also a convenience. There's a lot of usecases where you want to pass around both a pointer and a length.

Considering how many extremely serious bugs have resulted from a lack of bounds-checking, and considering the relatively low run-time overhead of doing it (especially with some decent optimizations from the compiler), I don't find it funny at all.

> I don't see rust reducing this cost much except.

At scale, a language with a module system will reduce cost substantially.

If something is specified as "unsafe", it is implemented correctly if it is unsafe - ask Intel ;-)
Take off every zig... for great justice!
all your codebase are belong to us!
> we are professionals, and so we do not accept undefined behavior

Lol'd, tell that wannabe-elite-C-programmers.