Hacker News new | ask | show | jobs
by chongli 1171 days ago
I went to look up undefined behaviour in Rust and I got this scary warning:

Warning: The following list is not exhaustive. There is no formal model of Rust's semantics for what is and is not allowed in unsafe code, so there may be more behavior considered unsafe. The following list is just what we know for sure is undefined behavior. Please read the Rustonomicon before writing unsafe code.

After the warning was a list of many of the same types of things that are undefined behaviour in C. In addition, there’s a bunch more undefined behaviour related to improper usage of the unsafe keyword.

So I don’t think you get a free lunch with Rust here. What you get is a “safe” playground if you stay within the guard rails and avoid using the unsafe keyword. But then you are limited to writing programs which can be expressed in safe Rust, a proper subset of all programs you might want to write.

Furthermore, the lack of a formal specification for Rust is one area where it lags behind C, a standardized language. All of the undefined behaviour in C is decreed and documented by the standard, having been decided by the committee. Rust, on the other hand, may have weird and unpredictable behaviour that you just have to debug yourself, which may or may not be compiler bugs.

2 comments

I agree rust isn’t perfect, but I think you underestimate the value of “safe” code.

I often write programs that have unsafe code. However, the unsafe code is never more than 100 lines, which means I have a very small amount of code to reason about — Rust users expect (of course, you as a programmer has to enforce) that it should be possible to cause UB from safe code, so my “safe interface” to my unsafe code ensures my code can’t cause UB, no matter what I call.

On problem with Rust is generally when you mess up it panics — I think that’s better than buffer overflows and the like, but still not a good user experience.

This means there is a very small amount of code I have to really think about, while in C or C++, basically any place x[i] appears (regardless of if x is a pointer or a std::vector).

You can of course write safe C code, people do, but it’s hard, and it only takes one slip up anywhere in your program to blow it.

In one sense, C is the unsafe code block for myriad other languages, like Python. Python users don’t want to deal with undefined behaviour either. They want to write their high level code in NumPy or PyTorch and just have everything work very fast.

Little do they know: they rely on C for those libraries and for things like ATLAS and LAPACK, which implement the underlying numerical linear algebra code. Well, it turns out that ATLAS relies pretty heavily on optimizing C compilers to generate optimal code on many different platforms. At the bottom of all this are the many loop optimizations included in compilers which, thanks to undefined behaviour in the C spec, are able to assume that code is always on the happy path.

It also turns out that Rust includes bindings to ATLAS and LAPACK. I would imagine at some point people might want to write a new linear algebra package in pure Rust. I think it’ll be quite difficult to match the performance of those two in safe Rust, but we’ll see.

Isn't LAPACK written in Fortran?
You're right, and ATLAS is as well, but Fortran has undefined behaviour [1] for all the same reasons that C does.

[1] https://stackoverflow.com/a/57558908

C does not have a formal specification either. It has a standard's document that is written using formal English, but it does not provide a formal spec of C's semantics. A formal spec of a programming language's semantics would entail using a formal semantic model such as operational or denotational semantics. Some programming languages do specify the formal semantics for the entire language or some subset of the language but C is not one of them.

Your claim that the C Standard lists all undefined behavior is actually false. The C Standard only lists out the explicit list of undefined behavior, but it does not list out the implicit list of undefined behavior. There have been efforts to make just such a list but it's an incredibly difficult task.