|
|
|
|
|
by mjevans
849 days ago
|
|
Different levels of warnings might be useful. -Wub # Warn _anytime_ there is detected potential undefined behavior, irrespective of if there is an associated optimization. -Wubelim # Warn any time code is eliminated as a result of undefined behavior / assumptions. -Wub... # Any other classes of UB optimizations that change the program as (incorrectly) written. Again, the goal is to provide feedback that improves the program and possibly educates / reminds the programmer about how their meanings might be misunderstood. |
|
Can I ask you, have you tried any existing tools? Coverity static analysis, Klocwork, PVS-Studio, clang static analysis, tis-interpreter, Frama-C? What did you think of those? If not, why not (how important is the problem to you)?
My understanding of these tools is that they start by marking every spot potential UB could happen -- every add is potentially overflowing, every pointer dereference is potentially null or freed or whatnot, and then they use solvers to prove that the UB does not occur, and print out the rest. The benefit they have is that they can examine more than one file at a time (the compiler may only look at one .c file at a time) and they have permission to take much longer than compiling.
> -Wubelim # Warn any time code is eliminated as a result of undefined behavior / assumptions.
This doesn't happen, the compiler doesn't detect your UB and use that to delete your code. Consider this:
The compiler sees 'x' mentioned in two places, once where it's defined, and once where it's used (picture the compiler building up a graph of places a values is set (definitions) and places the value is used, the use-def graph) and replaces the print with "printf("%d", 4);", then since 'x' is dead it can be deleted entirely. The rest of the code with 'y' and 'p' executes exactly in the way the programmer wrote it, we keep 'y' on the stack, and make 'p' a pointer out of bounds by computing the address that is 4 below 'y' and writing sizeof(int) bytes representing the value 7 there. We don't really go out of our way to detect UB.Another way to think about it is that the assumptions we make about your program being free of UB are completely indistinguishable from all the rest of the correct and working code. "int x = 4;" should declare a new variable, named x, with an int's worth of memory, initialized to the value 4. That is precisely as true to the compiler as any UB-performing, code. When you write "p->xcoord" you are telling the compiler that 'p' is a valid pointer to an object of its type at this moment, and it believes you. Trust the programmer, and all that.
> -Wub... # Any other classes of UB optimizations that change the program as (incorrectly) written.
"UB optimizations" isn't a thing. It just isn't. The optimizations never change the program, at least, not unless the compiler is buggy. The compiler's job is to find some assembly which meets the specification we call the program. With the optimizer enabled, we spend more time so that we can select assembly that minimizes a cost model we have for the execution time on the underlying machine (or sometimes file size).
> Again, the goal is to provide feedback that improves the program and possibly educates / reminds the programmer about how their meanings might be misunderstood.
FWIW we agree on the goal.
The model for warnings in clang at least has been to look at the code as it is typed, and focus on errors that programmers make. We have all kinds of complex rules for warnings, like "if (3 < 4)" issues a warning (-Wtautological-compare) but "if (MAX_THREADS < MAX_CORES)" with #define MAX_THREADS 3 and #define MAX_CORES 4 doesn't. We've put a ton of effort into getting this sort of thing right, and that includes warnings that code will always produce UB when run, even if it was expanded through macros or templates. It's not an exhaustive system, the warnings work was guided by actual bugs we've encountered in real systems.
There might be another way to do this. The C++ constexpr feature has the compiler evaluate some functions at compile time and detect any UB they encounter as they run. The clang implementation of this can also handle working with values that are not known at compile time, and working with dynamic allocations. One could try to run every function with the constexpr evaluator and see whether it does a better job at producing good warnings, then remove the redundant warnings (made by pattern matching on the AST) and see if the result is fast enough to use as part of regular compilation.