| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vgatherps 435 days ago
	Uninitialized memory being UB isn’t an insane default imo (although it makes masked simd hard), nor is most UB. But the lack of escape hatches can be frustrating

1 comments

adastra22 435 days ago

Anything being UB is insane to me...

link

dathinab 435 days ago

only until you get deeper into how the hardware actually work (and OS to some degree)

and realize sometimes the UB is even in the hardware registers

and that the same logical memory address might have 5 different values in hardware at the same time without you having a bug

and other fun like that

so the insanity is reality not the compiler

(through IMHO in C and especially C++ the insanity is how easily you might accidentally run into UB without doing any fancy trickery but just dumb not hot every day code)

link

imtringued 433 days ago

None of what you said makes any sense. You're mixing "UB" and bugs.

UB is about declaring programs invalid with a snide "don't do that", not about incorrect execution due to an incorrect specification. E.g. speculative execution running in privileged mode due to a prior syscall is just a plain hardware bug. It's not undefined behavior. In fact, the bug in question is extremely well defined.

The closest thing to reading undefined behavior is reading a "don't care" or VHDL's 'U' value from a std_ulogic and even those are well defined in simulation, just not in physical hardware, but even there they can only ever be as bad as reading a garbage value. Since a lot of the hardware design is non-programmable, there is also usually no way to exploit it.

link

dooglius 434 days ago

There is no UB in hardware registers or physical DRAM, I don't think you actually have familiarity with how the hardware works if you make this claim. (Or perhaps you aren't familiar with how crazy "UB" in the sense of the ISO C documentation is)

EDIT: one could see "apparent" violation of memory consistency if say the cache subsystem or memory controller were misconfigured, however this would require both (1) you are running in kernel mode, not user-space (2) you have a bug, so GP's claim is not supported that bug-free code could encounter such a state.

link

addaon 434 days ago

> There is no UB in hardware registers or physical DRAM

This seems very sensitive to specific definitions that others might not share. DRAM is provided with a spec sheet that defines its behavior (if you write to an address, you’ll read back the same value from the same address in the future) under certain conditions. If you violate those conditions, the behavior is… undefined. If you operate DRAM with the wrong refresh timing, or temperature, or voltage, or ionizing radiation level, you may see strange behavior. Even non-local behavior, where the value read from one cell depends on other cells (RowHammer). How is this not UB?

link

adastra22 434 days ago

If a C program accesses uninitialized memory (UB), it is perfectly compliant with the language spec for the compiler to reformat your hard drive and replace your OS code with crypto mining.

I'm not exaggerating by a legalistic interpretation, and I'm only slightly exaggerating in practice. UB can do some really weird, unintuitive stuff in practice on real hardware:

https://mohitmv.github.io/blog/Shocking-Undefined-Behaviour-...

The point is that this extreme UB should never happen. It was a choice of the compiler implementors, and rather than fix this, they allowed the escape hatch of UB in the spec. It would be more sensible for the compiler to say that, e.g., accessing uninitialized memory results in a nonspecified value, or even possibly multiple different nonspecified values if accessed from different threads. That captures what we expect to happen, but would be (according to C language spec lawyers) defined behavior.

In practice, it would mean that compliant compilers will ensure that any situation in which uninitialized memory could be accessed would not result in weird edge case page faults on certain architectures or whatever that could in fact lead to wacky UB situations.

This is not an unreasonable ask.

link

addaon 434 days ago

> This is not an unreasonable ask.

But isn't this exactly parallel to the Rowhammer case in DRAM? When operating at the edge of the spec, the behavior of DRAM becomes undefined. (And, of course, one challenge with Rowhammer was about /which/ edge of the spec this happened on.) In this case, writing one physical address altered the contents of other physical addresses. This is "really weird, unintuitive stuff … on real hardware." And of course we can (and do) ask DRAM vendors to not take advantage of this undefined behavior; but they do so as an optimization, allowing slightly smaller and more closely spaced DRAM cells, and thus higher density DRAM dice for the same price. Just like it's possible to work with a language with fully-defined semantics at the cost of performance, it's possible to buy DRAM with much wider specifications and more clearly defined behavior up to the edges of those specifications… at the cost of performance.

Extreme UB in both hardware and software is a choice of priorities. You may favor giving up performance capabilities to achieve a more understandable system (so do I! I do most of my work on in-order lockstep processors with on-die ECC SRAM to maximize understandability of failure modes), but the market as a whole clearly does not, in both hardware and software.

link

dooglius 434 days ago

I've edited my claim to be a bit more clear, however in the context of parent's claim we are talking about bug-free code on an non-buggy physical processor, and I think implicitly we are talking about user-mode code where one does not have the ability to alter any DRAM timing configuration registers anyway.

link

Aurornis 434 days ago

> There is no UB in hardware registers

There most definitely is.

In the ARM documentation this is referred to as “UNPREDICTABLE”. The outcome is not defined. It may work. It may not. It may put garbage data in a register.

link

dooglius 434 days ago

I've edited/clarified the claim above. Parent did say "without you having a bug" and entering an UNPREDICTABLE state would be buggy code. Also, "maybe puts garbage data in a register" is not UB, it's a much more reasonable thing in UB (the definition of UNPREDICTABLE prior to ARMv8 does seem to allow for UB, however). I don't believe that such an UNPREDICTABLE state is reachable from user-space/unprivileged code (or else it would violate the chip's security properties) -- but if I'm wrong on that I'd be interested in an example.

link

adastra22 434 days ago

Here are just a few examples:

https://mohitmv.github.io/blog/Shocking-Undefined-Behaviour-...

link

uecker 434 days ago

I do not find it so easy to accidentally run into UB in C if you follow some basic rules. The exceptions are null pointer dereferences, out-of-bounds accesses for arrays, and signed overflow, all those can be turned into run-time traps. The rules include no pointer arithmetic, no type casts, and having some ownership strategy. None of those is difficult to implement and where exceptions are made, one should treat it carefully similar to using "unsafe" in Rust.

link

IshKebab 435 days ago

Nah it makes some sense for portability between architectures. Or at least it did back when C was invented and there were some wild architectures out there.

And it definitely does allow some optimisation. But probably nothing significant on modern out-of-order machines.

link

dathinab 435 days ago

> there were some wild architectures out there.

what is out there is still pretty wield

just slightly less

> probably nothing significant on modern out-of-order machines.

having no UB at all will kill a lot of optimizations still relevant today (and won't match anymore to hardware as some UB is on hardware level)

out of order machines aren't magically fixing that, just makes some less optimized code work better, but not all

and a lot of low energy/cheap hardware does have no or very very limited out of order capabilities so it's still very relevant and likely will stay very relevant for a very long time

link

jcranmer 434 days ago

How would you implement integer-to-pointer conversions without UB?

link

adastra22 434 days ago

What is UB about integer-to-pointer conversions?

link

jcranmer 434 days ago

Plenty of things! The resulting pointer may not be pointing to an object that is currently live, for example. It may not even be pointing to an object that makes any sense in the language's object model. It might be pointing to a return address on the stack, for example. Or a constant in the constant pool. Or a saved register in the middle of a computation that corresponds to no variables in the original program.

In short, the moment you enable integer-to-pointer conversions (assuming your target has a flat address space), you create pointer provenance problems whose only resolution is that some things have to be UB.

link

adastra22 434 days ago

I don't think any of those are undefined behavior in the strict sense in which the term is defined in the C/C++ standards. Pointer casts are defined behavior. I believe the things you point to are either implementation-defined or unspecified, which is different from UB.

It may seem nitpicky, but the downside of relying on implementation defined or unspecified behavior is largely boxed and contained. E.g you might get a memory access error. UB is, in principle, completely unlimited in downside. And because of that, it often interacts badly with optimization passes, resulting in very strange bugs.

link

pcwalton 434 days ago

jcranmer is correct and pointer provenance-related issues are not "boxed and contained". Start here: https://www.ralfj.de/blog/2020/12/14/provenance.html

link

jcranmer 434 days ago

Pointer provenance is UB in the C/C++ sense, although it's one that largely lurks in the implementation-defined behavior of "integer/pointer casts are implementation-defined." The closest you're going to find to a full specification of pointer provenance is TS 6010 (https://www.open-std.org/jtc1/sc22/WG14/www/docs/n3226.pdf), but it should be noted that no compilers actually implement provenance strictly along the lines of TS 6010. Instead, they implement a poorly-documented, buggy, internally incoherent definition of provenance...

> It may seem nitpicky, but the downside of relying on implementation defined or unspecified behavior is largely boxed and contained.

... that is incoherent precisely because the interactions of provenance with optimizations isn't boxed and contained.

It's really not until people started putting the model into formal semantics that they realized "hey, wait a second, this means either most of our optimizations are wrong or our semantics are wrong" and the consensus is that it was the semantics that were broken.

link