Uninitialized memory being UB isn’t an insane default imo (although it makes masked simd hard), nor is most UB. But the lack of escape hatches can be frustrating
only until you get deeper into how the hardware actually work (and OS to some degree)
and realize sometimes the UB is even in the hardware registers
and that the same logical memory address might have 5 different values in hardware at the same time without you having a bug
and other fun like that
so the insanity is reality not the compiler
(through IMHO in C and especially C++ the insanity is how easily you might accidentally run into UB without doing any fancy trickery but just dumb not hot every day code)
None of what you said makes any sense. You're mixing "UB" and bugs.
UB is about declaring programs invalid with a snide "don't do that", not about incorrect execution due to an incorrect specification. E.g. speculative execution running in privileged mode due to a prior syscall is just a plain hardware bug. It's not undefined behavior. In fact, the bug in question is extremely well defined.
The closest thing to reading undefined behavior is reading a "don't care" or VHDL's 'U' value from a std_ulogic and even those are well defined in simulation, just not in physical hardware, but even there they can only ever be as bad as reading a garbage value. Since a lot of the hardware design is non-programmable, there is also usually no way to exploit it.
There is no UB in hardware registers or physical DRAM, I don't think you actually have familiarity with how the hardware works if you make this claim. (Or perhaps you aren't familiar with how crazy "UB" in the sense of the ISO C documentation is)
EDIT: one could see "apparent" violation of memory consistency if say the cache subsystem or memory controller were misconfigured, however this would require both (1) you are running in kernel mode, not user-space (2) you have a bug, so GP's claim is not supported that bug-free code could encounter such a state.
> There is no UB in hardware registers or physical DRAM
This seems very sensitive to specific definitions that others might not share. DRAM is provided with a spec sheet that defines its behavior (if you write to an address, you’ll read back the same value from the same address in the future) under certain conditions. If you violate those conditions, the behavior is… undefined. If you operate DRAM with the wrong refresh timing, or temperature, or voltage, or ionizing radiation level, you may see strange behavior. Even non-local behavior, where the value read from one cell depends on other cells (RowHammer). How is this not UB?
If a C program accesses uninitialized memory (UB), it is perfectly compliant with the language spec for the compiler to reformat your hard drive and replace your OS code with crypto mining.
I'm not exaggerating by a legalistic interpretation, and I'm only slightly exaggerating in practice. UB can do some really weird, unintuitive stuff in practice on real hardware:
The point is that this extreme UB should never happen. It was a choice of the compiler implementors, and rather than fix this, they allowed the escape hatch of UB in the spec. It would be more sensible for the compiler to say that, e.g., accessing uninitialized memory results in a nonspecified value, or even possibly multiple different nonspecified values if accessed from different threads. That captures what we expect to happen, but would be (according to C language spec lawyers) defined behavior.
In practice, it would mean that compliant compilers will ensure that any situation in which uninitialized memory could be accessed would not result in weird edge case page faults on certain architectures or whatever that could in fact lead to wacky UB situations.
But isn't this exactly parallel to the Rowhammer case in DRAM? When operating at the edge of the spec, the behavior of DRAM becomes undefined. (And, of course, one challenge with Rowhammer was about /which/ edge of the spec this happened on.) In this case, writing one physical address altered the contents of other physical addresses. This is "really weird, unintuitive stuff … on real hardware." And of course we can (and do) ask DRAM vendors to not take advantage of this undefined behavior; but they do so as an optimization, allowing slightly smaller and more closely spaced DRAM cells, and thus higher density DRAM dice for the same price. Just like it's possible to work with a language with fully-defined semantics at the cost of performance, it's possible to buy DRAM with much wider specifications and more clearly defined behavior up to the edges of those specifications… at the cost of performance.
Extreme UB in both hardware and software is a choice of priorities. You may favor giving up performance capabilities to achieve a more understandable system (so do I! I do most of my work on in-order lockstep processors with on-die ECC SRAM to maximize understandability of failure modes), but the market as a whole clearly does not, in both hardware and software.
I've edited my claim to be a bit more clear, however in the context of parent's claim we are talking about bug-free code on an non-buggy physical processor, and I think implicitly we are talking about user-mode code where one does not have the ability to alter any DRAM timing configuration registers anyway.
In the ARM documentation this is referred to as “UNPREDICTABLE”. The outcome is not defined. It may work. It may not. It may put garbage data in a register.
I've edited/clarified the claim above. Parent did say "without you having a bug" and entering an UNPREDICTABLE state would be buggy code. Also, "maybe puts garbage data in a register" is not UB, it's a much more reasonable thing in UB (the definition of UNPREDICTABLE prior to ARMv8 does seem to allow for UB, however). I don't believe that such an UNPREDICTABLE state is reachable from user-space/unprivileged code (or else it would violate the chip's security properties) -- but if I'm wrong on that I'd be interested in an example.
I do not find it so easy to accidentally run into UB in C if you follow some basic rules. The exceptions are null pointer dereferences, out-of-bounds accesses for arrays, and signed overflow, all those can be turned into run-time traps. The rules include no pointer arithmetic, no type casts, and having some ownership strategy. None of those is difficult to implement and where exceptions are made, one should treat it carefully similar to using "unsafe" in Rust.
Nah it makes some sense for portability between architectures. Or at least it did back when C was invented and there were some wild architectures out there.
And it definitely does allow some optimisation. But probably nothing significant on modern out-of-order machines.
> probably nothing significant on modern out-of-order machines.
having no UB at all will kill a lot of optimizations still relevant today (and won't match anymore to hardware as some UB is on hardware level)
out of order machines aren't magically fixing that, just makes some less optimized code work better, but not all
and a lot of low energy/cheap hardware does have no or very very limited out of order capabilities so it's still very relevant and likely will stay very relevant for a very long time
Plenty of things! The resulting pointer may not be pointing to an object that is currently live, for example. It may not even be pointing to an object that makes any sense in the language's object model. It might be pointing to a return address on the stack, for example. Or a constant in the constant pool. Or a saved register in the middle of a computation that corresponds to no variables in the original program.
In short, the moment you enable integer-to-pointer conversions (assuming your target has a flat address space), you create pointer provenance problems whose only resolution is that some things have to be UB.
I don't think any of those are undefined behavior in the strict sense in which the term is defined in the C/C++ standards. Pointer casts are defined behavior. I believe the things you point to are either implementation-defined or unspecified, which is different from UB.
It may seem nitpicky, but the downside of relying on implementation defined or unspecified behavior is largely boxed and contained. E.g you might get a memory access error. UB is, in principle, completely unlimited in downside. And because of that, it often interacts badly with optimization passes, resulting in very strange bugs.
Pointer provenance is UB in the C/C++ sense, although it's one that largely lurks in the implementation-defined behavior of "integer/pointer casts are implementation-defined." The closest you're going to find to a full specification of pointer provenance is TS 6010 (https://www.open-std.org/jtc1/sc22/WG14/www/docs/n3226.pdf), but it should be noted that no compilers actually implement provenance strictly along the lines of TS 6010. Instead, they implement a poorly-documented, buggy, internally incoherent definition of provenance...
> It may seem nitpicky, but the downside of relying on implementation defined or unspecified behavior is largely boxed and contained.
... that is incoherent precisely because the interactions of provenance with optimizations isn't boxed and contained.
It's really not until people started putting the model into formal semantics that they realized "hey, wait a second, this means either most of our optimizations are wrong or our semantics are wrong" and the consensus is that it was the semantics that were broken.