Hacker News new | ask | show | jobs
by snnn 1580 days ago
I think the most critical part in the flow is the integer overflow bug, and it is totally avoidable. I am a software engine at Microsoft. Half of my time was spent on security and compliance. We have the right tool, right policy to avoid such things happen. However, I'm not saying Microsoft software is free of integer overflow bugs. I don't intend to advertise Microsoft C/C++ development tools here, but they are the ones I know most.

Let's go to the technical part: If you are asked to implement the binary algorithm with your favorite programming language, how do you verify your code? Unit-tests. How many test cases you will need? More than 10. Binary search implementations are easy to suffer integer overflow bugs(remember the one in JDK?), as long as you have enough tests, your don't need to worry too much. But how much is enough? People can't implement binary search correctly in decades is not because we don't know the algorithm enough or we don't have excellent software engineers, it is because w don't know how to test our code thoroughly. Any non-trivial C/C++ function may need tens of thousands test cases. Simply you can't write them by hand.

You need the right tools: fuzzing and static analysis.

At Microsoft, every file parser should go through fuzzing, which basically is you generate some random input, then you run your tests with the random inputs. Not very fantastic. But there is another kind of fuzzing: symbolic execution, which tries to find all the possible execution paths of your code. If you run symbolic execution with your binary search code, you can get 100% test coverage. And it is guaranteed bug-free. It is like a math proof. Please note the advantage is based on human just had surprising great advancement on SAT solvers in the last 20 years. And often you need to make some compromises between your business goal and security. Most functions can't reach 100% test coverage. You need to simplify them. See https://github.com/klee/klee to get a quickstart. Though C/C++ is often considered unsafe, they have the best fuzzer.

Then it is about SAL annotation and static analyzer. In C, whenever you pass a pointer of an array to another function, you should also pass its length with it. And in the callee function you should check the length. If you forgot it, your static code analyzer will give you a warning. In such a sense, if you didn't allocate enough memory, it will only result an error code being returned instead of undefined behavior.

The last thing: Use safeint wrapping your malloc function. https://docs.microsoft.com/en-us/cpp/safeint/safeint-library...

When we move off the binary search toy example to a real code base, clearly you can see how much extra effort is needed to make the code safe. Please pardon me, most OSS libraries don't have the resource. Many famous OSS projects are "Mom-and-pop" shops. They don't have any compliance rule. They invest very little on fuzzing. So the big companies really should help them. Now you see an integer overflow bug was found in Apple's image render, but was the code written by Apple? Not necessarily. Now we all see the importance of the Open Source movement. It's time to think how to harden their security. For example, even I want to spend my free time on adding SAL annotations to an OSS project I love, would the maintainers accept it?

1 comments

Why aren’t you using higher-level memory safe languages for that? In C#, the runtime checks for integer overflow can be enabled with a single compiler switch. The switch is not set by default for some reason, but easy enough to enable manually, a single line in *.csproj file.

If you think GC performance is not good enough, see that proof of concept: https://github.com/Const-me/Vrmac/tree/master/VrmacVideo/Con... That C# code implements parser for Mpeg4 format. That format is way more complicated than GIF or even PDF, yet that code runs fine even on very slow computers (Raspberry Pi 4). There’s another similar one in that project for MKV format.

I'd prefer to catch such errors at compile-time. More static the language is, more optimization/analysis can be made. Sometimes the problem can be simplified when your CPU is 64-bit capable but you limit array sizes to 2GB, then you can use 64-bit math to calculate memory sizes to avoid integer overflow. Java and Google protobuf are two such examples. Sometimes the 2GB limit is acceptable, sometimes it is not. You know protobuf even tries to limit string size to tens of MB for safety? The simplification can not be accepted as a general solution.

Back to your Raspberry Pi 4 example: The CPU is 64-bit, but most users only use 32-bit OS with it. Today most Linux installations are 64-bit. I believe Google doesn't care much on protobuf's security on 32-bit systems. So does the other OSS software. So if you take it seriously, it works but it is not safe(when we are talking integer overflow).

> I'd prefer to catch such errors at compile-time.

I don't believe it's possible. These integers often coming from user's input, disk, or network. Compiler can't validate these simply because it doesn't have the data.

Even when possible, it's insanely complicated, and computationally expensive, to catch in compile-time, yet very simple in runtime.

Runtime performance overhead is very small because branch prediction is quite efficient on modern CPUs, these branches are almost never taken, JIT compiler knows about that, and emits code which will be predicted correctly even when uncached.

> if you take it seriously, it works but it is not safe

Noy sure I follow. Let's pretend I am taking it reasonably seriously, despite old and unpaid hobby project.

Why it's not safe? The Mpeg4 and MKV parsers are written in C#, and compiled with that <CheckForOverflowUnderflow> option set to True.