Hacker News new | ask | show | jobs
by nlewycky 653 days ago
I'm a huge proponent of UBSan and ASan. Genuine curiosity, what don't you like about them?

FWIW, there once was a real good-faith effort to clean up the problems, Friendly C by Prof Regehr, https://blog.regehr.org/archives/1180 and https://blog.regehr.org/archives/1287 .

It turns out it's really hard. Let's take an easy-to-understand example, signed integer overflow. C has unsigned types with guaranteed 2's complement rules, and signed types with UB on overflow, which leaves the compiler free to rewrite the expression using the field axioms, if it wants to. "a = b * c / c;" may emit the multiply and divide, or it can eliminate the pair and replace the expression with "a = b;".

Why do we connect interpreting the top bit as a sign bit with whether field axiom based rewriting should be allowed? It would make sense to have a language which splits those two choices apart, but if you do that, either the result isn't backwards compatible with C anyways or it is but doesn't add any safety to old C code even as it permits you to write new safe C code.

Sometimes the best way to rewrite an expression is not what you'd consider "simplified form" from school because of the availability of CPU instructions that don't match simple operations, and also because of register pressure limiting the number of temporaries. There's real world code out there that has UB in simple integer expressions and relies on it being run in the correct environment, either x86-64 CPU or ARM CPU. If you define one specific interpretation for the same expression, you are guaranteed to break somebody's real world "working" code.

I claim without evidence that trying to fix up C's underlying issues is all decisions like this. That leads to UBSan as the next best idea, or at least, something we can do right now. If nothing else it has pedagogical value in teaching what the existing rules are.

1 comments

I'm not sure I understand your example. Existing modern hardware is AFAICT pretty conventionally compatible with multiply/divide operations[1] and will factor equivalently for any common/performance-sensitive situation. A better area to argue about are shifts, where some architectures are missing compatible sign extension and overflow modes; the language would have to pick one, possibly to the detriment of some arch or another.

But... so what? That's fine. Applications sensitive to performance on that level are already worrying about per-platform tuning and always have been. Much better to start from a baseline that works reliably and then tune than to have to write "working" code you then must fight about and eventually roll back due to a ubsan warning.

[1] It's true that when you get to things like multi-word math that there are edge cases that make some conventions easier to optimize on some architectures (e.g. x86's widening multiply, etc...).