Hacker News new | ask | show | jobs
by blackpill0w 1014 days ago
How difficult is it to make a compiler extension that remembers buffers' size and checks if we're overflowing at each access? It could be used at least just in debug versions of critical software.

It doesn't sound impossible to me but I know nothing about compiler development :)

6 comments

Hard. Apple actually has a RFC for this where functions taking buffer-like parameters are adjusted to take an additional length parameter and then the compiler edits the code to plumb lengths through all of these things to insert a bounds check at use. This can work in many cases, but not all.

Rolling out this sort of change across a large codebase is hard as shit. While it sounds like it is mostly transparent, as soon as you run into a sufficiently large codebase all sorts of things start blowing up that you need to fix by hand before such a feature can be rolled out.

You can also do this with pointer tagging and some other techniques, but without hardware support this is amazingly slow. You can see just how much slower an asan build is, for example.

Apple is basically catching up with the Windows XP SP2 effort, which lead to the introduction of SAL annotations on Windows, and yes it was the reason for its delay.
I think the short answer is "trivial in some cases, impossible in others". It's almost certainly possible that your compiler could inspect every allocation and tag each pointer with it internally. The problem comes with everything else - once you add loops and conditionals the length of that pointer can be all over the place. You'd basically need a symbolic executor tracking every pointer.

There are some big issues with this:

1. It's slow. Symbolic execution involves the interpretation of your program.

2. It would be imperfect and you'd likely have false positives.

3. It would likely be incomplete - for example, how would you handle the situation of only having a header?

So it's a good idea but it's very hard to make practically useful.

The easy way to do it involves changing the ABI of pointers so that they are now (address, bounds) pairs instead of just addresses. However, an awful lot of C code assumes that a pointer is just an address, and changing the ABI in this way will break the vast majority of non-trivial programs. (Witness the difficulty CHERI has in getting major software to work with it.)
You can, it's called valgrind (or more accurately, memcheck). And people don't use valgrind because it is slooooooow. Dynamic checking is useful, but not an ultimate way to go.
> remembers buffers' size

Where?

Once you have a bare pointer, you've lost track of what the original definition might have been, so you (the compiler / runtime / programmer) have no way of knowing that you've exceeded the size.

That's not true, it is merely true on most ABIs. The only case where C really erases this information is casting to uintptr_t and back.
Unless it is clearly specified on the ISO C standard, it is true in practice, and something that is impossible to rely on.
gcc also has some builtins to check pointer sizes when the compiler is able to figure it out.

https://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Object-Size-Che...

Which is why I harp on the idea that the real problem is the gold bricks on WG14 who are intentionally blocking improvements to make C safer.

Also point out that if you can implement C on 16bit 0x86's segmented architecture you can certainly implement C with phat pointers too.

It's trivial but Big Tech is in bed with Big Hacker

Or it's hard like everyone keeps saying.

I'm going with the second option