Hacker News new | ask | show | jobs
by tialaramex 2 days ago
> Rest assured that you are much more likely to hit a miscompilation in your compiler's backend, and that it is much harder to detect.

The LLVM provenance bug is a really nice example. The Rust which tickles this bug (LLVM emits nonsense, claiming that two integers a and b are different but then calculating that a - b == 0...) is fairly clear, you wouldn't write it by accident but it's obvious what it should do, and unsettling to discover that the bug isn't in Rust's compiler frontend but in LLVM.

You can write equivalent C or C++ to show the bug with Clang - but when you try to write it you'll struggle, not to reproduce the bug per se, but to stop writing Undefined Behaviour, which invalidates your bug report because the LLVM devs will say "This is UB, working as intended". The non-UB reproducers are much more elaborate than the safe Rust was.

1 comments

What is the "LLVM provenance bug"?
https://github.com/llvm/llvm-project/issues/45725

Are you familiar with pointer provenance ? This gets very deep, very fast, so if you don't know and don't want to know I can't help you, but if you do want to know try maybe: https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html

So e.g. if we make a pointer to thing A, which we then destroy, and then for a pointer to thing B, and then we compare these pointers, even if the address which will be the only bits making up this pointer in hardware was identical, a language could choose to say these pointers are not the same (Rust says they compare equal but that's up to Rust). LLVM is equipped to do this optimisation. So far, not a bug, though perhaps not what you expected...

However everybody is pretty sure that we don't want provenance for other value types. It's troublesome for pointers but we're used to it and it unlocks important optimisations, but for every other value type it's just extra trouble. So Rust's provenance model says only pointers have provenance, and proposals for C and C++ likewise. If we ask for the address from a pointer, making it into just an integer, it should not longer have provenance.

But, LLVM doesn't really track whether a value "is" pointer or not per se, so it ends up applying that "they're not equal" optimisation to the integers we've made from pointers, even though the integers are definitely equal and we're about to do a subtraction to prove it. Bang.