Hacker News new | ask | show | jobs
by adgjlsfhk1 680 days ago
The really big difference is the searchability and frequency of possibly unsafe operations. If you want to audit all possible unsafe lines of code in a Rust project, you can grep for "unsafe" and find all of them (and in most projects there will be very few if any). In C, on the other hand, you need to look at literally every indexing operation, every pointer dereference, every use of a variable (to make sure it isn't potentially used after free or before initialization), every cast, and probably some extras that I've forgotten. As such, rather than having a low double digit number of cases to look at, you have to look at the vast majority of lines of code.
1 comments

While true, my point is that you can write C in a way that many functions are also obviously free of UB, and you only need to carefully vet the pointer arithmetic in some low-level functions.

So I agree with the point in principle, I just do not like the "spin" of "every line of C is time bomb nobody can understand" while in Rust you just have to look at some lines of "unsafe" and all is good.

It's not my experience that C can be obviously free of UB and I'm curious to know how you approach that. I'm not aware of any methods or tools that claim to achieve it and there's a long history of "correct" programs written by experts were discovered to contain subtle UB with improvements in automated analysis. Here's one example, from Runtime Verification: https://runtimeverification.com/blog/mare-than-14-of-sv-comp...
For example, the following function has obviously no UB:

unsigned int mul(unsigned int x, unsigned int y) { return x * y; }

Or there are many high level function structures as, which also has no UB (with some assumption on the called functions):

void bar() { struct foo *p = foo_alloc(); foo_do1(p); foo_do2(p); foo_delete(p); }

Such code can be easily screened and also this can be done automatically. There is a lack of open-source which can do this, but I have an experimental GCC branch which starts to do this and looks promising.

> Or there are many high level function structures as, which also has no UB (with some assumption on the called functions):

    void bar() { struct foo *p = foo_alloc(); foo_do1(p); foo_do2(p); foo_delete(p); }
Are we assuming foo_alloc always succeeds? malloc returns NULL to indicate failure to allocate, which this code wouldn't handle.

> Such code can be easily screened and also this can be done automatically.

That doesn't sound right at all. Robust static analysis of C code is extremely involved. It's an area of ongoing research.

Prior efforts along these lines have not been successful. Even adopting the MISRA C ruleset doesn't guarantee absence of undefined behaviour, for instance.

The first has no UB, but this trivial modification does:

unsigned short mul(unsigned short x, unsigned short y) { return x * y; }

I don't know about you, but I wouldn't think to treat these any differently unless I put on my language lawyer hat.

It is converted to int, so you have a signed multiplication. I don't think you need be a language lawyer to know this, just very basic C.

But I also do not worry about signed overflow anyhow, because compilers can turn them into traps.

I don't think I need to explain why it's unintuitive that multiplying two unsigned numbers sometimes results in a signed multiplication, even though signed types appear nowhere in the code. I couldn't tell you how many times I've seen some DSP application taking uint16s and throwing them into a filter without realizing it could be UB.

Language standards shouldn't rely on compiler options to save developers here. There's a lot of compilers in the world that don't support the same range of options GCC and clang have, like CompCert. Those are often the ones building safety-critical applications these days, where trapping would be inappropriate.

The key point is that no matter how you write your C code, for anyone else that wants to verify a lack of memory safety problems, they need to read every single line to determine which ones do the low level unsafe bits.
I understand this, but the importance of this is highly exaggerated. How in the world does it make sense to only audit for memory safety? There plenty of other safety and security issues. Only if you pretend that memory safety is all that matters, you can then claim a fundamental example that you only need to look at "unsafe blocks" and nothing else. Now, you can say that with limited time we can at least more easily ensure memory safety by reviewing "unsafe blocks" carefully and neglecting other problems. And this is true and I agree that this is an advantage, but the overall improvement for safety and security is incremental in reducing risk and not fundamental.
This isn't only about formal audits. Memory corruption and UB type bugs are also some of the hardest to debug since they may not reproduce in debug builds.
With sanitizers and valgrind I do not quite see this, in my experience subtle logic in overly complicated logic bugs much harder to debug.