Hacker News new | ask | show | jobs
by EustassKid 1007 days ago
Why aren't people using safe C compilers or libraries and stuff like that? Do they affect performance that much? If yes then what about libraries written in C so that they can be used in other languages (meaning performance is not the number one concern)?
3 comments

This is a great question.

The answer is that it's literally impossible to write a "safe C compiler" since the language is inherently memory unsafe.

There are various static analysis tools that can try to simulate C programs and try to automatically discover memory management bugs, but due to fundamental limitations of computation they can never catch all possible faults.

How difficult is it to make a compiler extension that remembers buffers' size and checks if we're overflowing at each access? It could be used at least just in debug versions of critical software.

It doesn't sound impossible to me but I know nothing about compiler development :)

Hard. Apple actually has a RFC for this where functions taking buffer-like parameters are adjusted to take an additional length parameter and then the compiler edits the code to plumb lengths through all of these things to insert a bounds check at use. This can work in many cases, but not all.

Rolling out this sort of change across a large codebase is hard as shit. While it sounds like it is mostly transparent, as soon as you run into a sufficiently large codebase all sorts of things start blowing up that you need to fix by hand before such a feature can be rolled out.

You can also do this with pointer tagging and some other techniques, but without hardware support this is amazingly slow. You can see just how much slower an asan build is, for example.

Apple is basically catching up with the Windows XP SP2 effort, which lead to the introduction of SAL annotations on Windows, and yes it was the reason for its delay.
I think the short answer is "trivial in some cases, impossible in others". It's almost certainly possible that your compiler could inspect every allocation and tag each pointer with it internally. The problem comes with everything else - once you add loops and conditionals the length of that pointer can be all over the place. You'd basically need a symbolic executor tracking every pointer.

There are some big issues with this:

1. It's slow. Symbolic execution involves the interpretation of your program.

2. It would be imperfect and you'd likely have false positives.

3. It would likely be incomplete - for example, how would you handle the situation of only having a header?

So it's a good idea but it's very hard to make practically useful.

The easy way to do it involves changing the ABI of pointers so that they are now (address, bounds) pairs instead of just addresses. However, an awful lot of C code assumes that a pointer is just an address, and changing the ABI in this way will break the vast majority of non-trivial programs. (Witness the difficulty CHERI has in getting major software to work with it.)
You can, it's called valgrind (or more accurately, memcheck). And people don't use valgrind because it is slooooooow. Dynamic checking is useful, but not an ultimate way to go.
> remembers buffers' size

Where?

Once you have a bare pointer, you've lost track of what the original definition might have been, so you (the compiler / runtime / programmer) have no way of knowing that you've exceeded the size.

That's not true, it is merely true on most ABIs. The only case where C really erases this information is casting to uintptr_t and back.
Unless it is clearly specified on the ISO C standard, it is true in practice, and something that is impossible to rely on.
gcc also has some builtins to check pointer sizes when the compiler is able to figure it out.

https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Object-Size-Che...

Which is why I harp on the idea that the real problem is the gold bricks on WG14 who are intentionally blocking improvements to make C safer.

Also point out that if you can implement C on 16bit 0x86's segmented architecture you can certainly implement C with phat pointers too.

It's trivial but Big Tech is in bed with Big Hacker

Or it's hard like everyone keeps saying.

I'm going with the second option

...What is a "safe C compiler"?
Check on the fiction section sir. Here is the CS one.

Although I think all the C compilers are safe ish lately. I haven't seen exploits that target defects in output. Usually the error is ID10T located in the prekeyboard device.

...a textbook oxymoron?
Not necessarily a "safe compiler" but maybe safe library for containers and things like that. It seems to me that most if not all major C projects just run sanitizers and static analysers.
rustc /s
All decent C compilers have compilation options so that at run-time any undefined actions, including integer overflow and out-of-bounds accesses, will be trapped.

The only problem is that these options are not the default and most C developers do not use them, especially for release versions.

I always use them, including for releases. In the relatively rare cases when this has a performance impact, I disable the sanitize options only for the functions where this matters and only after an analysis that guarantees that events like overflows or out-of-bounds accesses cannot happen.

Despite the hype, by default Rust is not safer than C compiled with the right options, because the default for Rust releases is also to omit many run-time checks.

Only when Rust will change the default to keep all run-time checks also in release builds, it will be able to claim that by default it is safer than C.

For now, when safety is desired, both C and Rust must be compiled with non-default options.

> Only when Rust will change the default to keep all run-time checks also in release builds, it will be able to claim that by default it is safer than C.

Which checks are you thinking of? The only thing that comes to mind is that integer overflow wraps instead of panics, but given that bounds are checked, it is still going to be a panic or logic bug rather than a buffer overflow.

It sounds like you're referring to sanitizers.

1. Notably, some sanitizers are not intended for production use. I think this has changed a bit for asan but at one point it made vulns easier to exploit. These aren't mitigations.

2. They're extremely expensive. You need tons of bookkeeping for pointers for them to work. If you're willing to take that hit I don't really understand why you're using C, just use a GC'd language, which is probably going to be faster at that point.

> Only when Rust will change the default to keep all run-time checks also in release builds, it will be able to claim that by default it is safer than C.

The only thing Rust turns off at release is that unsigned integer overflows panic in debug but wrap on release. That wrap can not lead to memory unsafety.

FWIW it is not recommended to use asan+co for release builds. These are designed as debugging tools, if you use them in production builds they may actually open up new bugs. See also: https://www.openwall.com/lists/oss-security/2016/02/17/9

I don't think anyone has built anything practically usable that is meant for production, though it wouldn't be impossible to do so.

It's more or less okay to use UBSan in production though, and that can be good.

But sometimes DoS is considered an exploit, and in that case you don't want to make things easier to crash.

> the default for Rust releases is also to omit many run-time checks.

...because the type system and borrow checker satisfies them at compile-time?

The only checks that are omitted at runtime are:

- checks that are exhaustively proven to be unnecessary by LLVM - checks that can never be triggered in the absence of UB

You shouldn't be triggering UB checks at runtime. If you rely on these checks, you're relying on UB itself, when all UB should be provably impossible.

Do you release with `-fsanitize=address` or what?
You can't really retrofit safety to C. The best that can be achieved is sel4, which while it is written in C has a separate proof of its correctness: https://github.com/seL4/l4v

The proof is much, much more work than the microkernel itself. A proof for something as large as webP might take decades.

> A proof for something as large as webP might take decades.

Assuming that it is even provable in the first place.