|
I think the most critical part in the flow is the integer overflow bug, and it is totally avoidable. I am a software engine at Microsoft. Half of my time was spent on security and compliance. We have the right tool, right policy to avoid such things happen. However, I'm not saying Microsoft software is free of integer overflow bugs. I don't intend to advertise Microsoft C/C++ development tools here, but they are the ones I know most. Let's go to the technical part: If you are asked to implement the binary algorithm with your favorite programming language, how do you verify your code? Unit-tests. How many test cases you will need? More than 10. Binary search implementations are easy to suffer integer overflow bugs(remember the one in JDK?), as long as you have enough tests, your don't need to worry too much. But how much is enough? People can't implement binary search correctly in decades is not because we don't know the algorithm enough or we don't have excellent software engineers, it is because w don't know how to test our code thoroughly.
Any non-trivial C/C++ function may need tens of thousands test cases. Simply you can't write them by hand. You need the right tools: fuzzing and static analysis. At Microsoft, every file parser should go through fuzzing, which basically is you generate some random input, then you run your tests with the random inputs. Not very fantastic. But there is another kind of fuzzing: symbolic execution, which tries to find all the possible execution paths of your code. If you run symbolic execution with your binary search code, you can get 100% test coverage. And it is guaranteed bug-free. It is like a math proof. Please note the advantage is based on human just had surprising great advancement on SAT solvers in the last 20 years. And often you need to make some compromises between your business goal and security. Most functions can't reach 100% test coverage. You need to simplify them. See https://github.com/klee/klee to get a quickstart. Though C/C++ is often considered unsafe, they have the best fuzzer. Then it is about SAL annotation and static analyzer. In C, whenever you pass a pointer of an array to another function, you should also pass its length with it. And in the callee function you should check the length. If you forgot it, your static code analyzer will give you a warning. In such a sense, if you didn't allocate enough memory, it will only result an error code being returned instead of undefined behavior. The last thing: Use safeint wrapping your malloc function. https://docs.microsoft.com/en-us/cpp/safeint/safeint-library... When we move off the binary search toy example to a real code base, clearly you can see how much extra effort is needed to make the code safe. Please pardon me, most OSS libraries don't have the resource. Many famous OSS projects are "Mom-and-pop" shops. They don't have any compliance rule. They invest very little on fuzzing. So the big companies really should help them. Now you see an integer overflow bug was found in Apple's image render, but was the code written by Apple? Not necessarily. Now we all see the importance of the Open Source movement. It's time to think how to harden their security. For example, even I want to spend my free time on adding SAL annotations to an OSS project I love, would the maintainers accept it? |
If you think GC performance is not good enough, see that proof of concept: https://github.com/Const-me/Vrmac/tree/master/VrmacVideo/Con... That C# code implements parser for Mpeg4 format. That format is way more complicated than GIF or even PDF, yet that code runs fine even on very slow computers (Raspberry Pi 4). There’s another similar one in that project for MKV format.