Hacker News new | ask | show | jobs
by CJefferson 393 days ago
I agree, I'd go further and say I wonder why primitive types aren't "frozen" by default.

I totally understand not wanting to promise things get zeroed, but I don't really understand why full UB, instead of just "they have whatever value is initially in memory / the register / the compiler chose" is so much better.

Has anyone ever done a performance comparison between UB and freezing I wonder? I can't find one.

3 comments

That assumes the compiler reserves one continuous place for the value, which isn’t always true (hardly ever true in the case of registers). If the compiler is required to make all code paths result in the same uninitialized value, that can limit code generation options, which might reduce performance (and performance is the whole reason to use uninitialized values!).

Also, an uninitialized value might be in a memory page that gets reclaimed and then mapped in again, in which case (because it hasn’t been written to) the OS doesn’t guarantee it will have the same value the second time. There was recently a bug discovered in one of the few algorithms that uses uninitialized values, because of this effect.

> same uninitialized value, that can limit code generation options

it pretty much requires the compiler to initialize all values when they first "appear"

except that this is impossible and outright hazardous if pointers are involved

But doable for a small subset like e.g.

- stack values (but would inhibit optimizations, potentially pretty badly)

- some allocations e.g. I/O buffers, (except C alloc has no idea that you are allocating an I/O buffer)

> If the compiler is required to make all code paths result in the same uninitialized value, that can limit code generation options

Can you provide (on say x86_64) an example of this, other than the case where the compiler prunes cases based on characterizing certain paths as UB? In other words, a case where "an uninitialized value is well-defined but can be different on each read" allows more performance optimization than "the value will be the same on each read".

> Also, an uninitialized value might be in a memory page that gets reclaimed and then mapped in again, in which case (because it hasn’t been written to) the OS doesn’t guarantee it will have the same value the second time. There was recently a bug discovered in one of the few algorithms that uses uninitialized values, because of this effect.

This does not sound correct to me, at least for Linux (assuming one isn't directly requesting such behavior with madvise or something). Do you have more information?

The most obvious general case (to me) is reading an uninitialized local variable in a loop. If uninitialized has to be the same value every time, you’d have to allocate a register or stack space to ensure the value was the same on every iteration. Instead, you’d don’t have to allocate anything, just use whatever value is in any register that’s handy. (By this logic you can also start pruning code, by picking the “most optimal” value for the uninitialized variable.)
I can’t find a citation, but my recollection is the problem happened with the Briggs-Torczon sparse set algorithm, which relies on uninitialized memory not changing. For performance, they were using MMAP_UNINITIALIZED (which has to be enabled with a kernel config).
But, I wonder how much it would reduce performance, if we only have to pick a value the first time the memory is read?

I would imagine there isn't that many cases where we are reading uninitalised memory and counting on that reading not saving a value. It would happen when reading in 8-byte blocks for alignment, but does it happen that much elsewhere?

if you pick a value you have to store it, and if you have to store it it might spill into memory when register allocation fails. Moving from register-only to stack/heap usage easily slows down your program by an order of magnitude or two. If this is in a hot path, which I'd argue it is since using uninitialized values seems senseless otherwise, it might have a big impact.

The only way to really know is to test this. Compilers and their optimizations depend on a lot of things. Even the order and layout of instructions can matter due to the instruction cache. You can always go and make the guarantee later on, but undoing it would be impossible.

Uninitialized memory being UB isn’t an insane default imo (although it makes masked simd hard), nor is most UB. But the lack of escape hatches can be frustrating
Anything being UB is insane to me...
only until you get deeper into how the hardware actually work (and OS to some degree)

and realize sometimes the UB is even in the hardware registers

and that the same logical memory address might have 5 different values in hardware at the same time without you having a bug

and other fun like that

so the insanity is reality not the compiler

(through IMHO in C and especially C++ the insanity is how easily you might accidentally run into UB without doing any fancy trickery but just dumb not hot every day code)

None of what you said makes any sense. You're mixing "UB" and bugs.

UB is about declaring programs invalid with a snide "don't do that", not about incorrect execution due to an incorrect specification. E.g. speculative execution running in privileged mode due to a prior syscall is just a plain hardware bug. It's not undefined behavior. In fact, the bug in question is extremely well defined.

The closest thing to reading undefined behavior is reading a "don't care" or VHDL's 'U' value from a std_ulogic and even those are well defined in simulation, just not in physical hardware, but even there they can only ever be as bad as reading a garbage value. Since a lot of the hardware design is non-programmable, there is also usually no way to exploit it.

There is no UB in hardware registers or physical DRAM, I don't think you actually have familiarity with how the hardware works if you make this claim. (Or perhaps you aren't familiar with how crazy "UB" in the sense of the ISO C documentation is)

EDIT: one could see "apparent" violation of memory consistency if say the cache subsystem or memory controller were misconfigured, however this would require both (1) you are running in kernel mode, not user-space (2) you have a bug, so GP's claim is not supported that bug-free code could encounter such a state.

> There is no UB in hardware registers or physical DRAM

This seems very sensitive to specific definitions that others might not share. DRAM is provided with a spec sheet that defines its behavior (if you write to an address, you’ll read back the same value from the same address in the future) under certain conditions. If you violate those conditions, the behavior is… undefined. If you operate DRAM with the wrong refresh timing, or temperature, or voltage, or ionizing radiation level, you may see strange behavior. Even non-local behavior, where the value read from one cell depends on other cells (RowHammer). How is this not UB?

If a C program accesses uninitialized memory (UB), it is perfectly compliant with the language spec for the compiler to reformat your hard drive and replace your OS code with crypto mining.

I'm not exaggerating by a legalistic interpretation, and I'm only slightly exaggerating in practice. UB can do some really weird, unintuitive stuff in practice on real hardware:

https://mohitmv.github.io/blog/Shocking-Undefined-Behaviour-...

The point is that this extreme UB should never happen. It was a choice of the compiler implementors, and rather than fix this, they allowed the escape hatch of UB in the spec. It would be more sensible for the compiler to say that, e.g., accessing uninitialized memory results in a nonspecified value, or even possibly multiple different nonspecified values if accessed from different threads. That captures what we expect to happen, but would be (according to C language spec lawyers) defined behavior.

In practice, it would mean that compliant compilers will ensure that any situation in which uninitialized memory could be accessed would not result in weird edge case page faults on certain architectures or whatever that could in fact lead to wacky UB situations.

This is not an unreasonable ask.

I've edited my claim to be a bit more clear, however in the context of parent's claim we are talking about bug-free code on an non-buggy physical processor, and I think implicitly we are talking about user-mode code where one does not have the ability to alter any DRAM timing configuration registers anyway.
> There is no UB in hardware registers

There most definitely is.

In the ARM documentation this is referred to as “UNPREDICTABLE”. The outcome is not defined. It may work. It may not. It may put garbage data in a register.

I've edited/clarified the claim above. Parent did say "without you having a bug" and entering an UNPREDICTABLE state would be buggy code. Also, "maybe puts garbage data in a register" is not UB, it's a much more reasonable thing in UB (the definition of UNPREDICTABLE prior to ARMv8 does seem to allow for UB, however). I don't believe that such an UNPREDICTABLE state is reachable from user-space/unprivileged code (or else it would violate the chip's security properties) -- but if I'm wrong on that I'd be interested in an example.
I do not find it so easy to accidentally run into UB in C if you follow some basic rules. The exceptions are null pointer dereferences, out-of-bounds accesses for arrays, and signed overflow, all those can be turned into run-time traps. The rules include no pointer arithmetic, no type casts, and having some ownership strategy. None of those is difficult to implement and where exceptions are made, one should treat it carefully similar to using "unsafe" in Rust.
Nah it makes some sense for portability between architectures. Or at least it did back when C was invented and there were some wild architectures out there.

And it definitely does allow some optimisation. But probably nothing significant on modern out-of-order machines.

> there were some wild architectures out there.

what is out there is still pretty wield

just slightly less

> probably nothing significant on modern out-of-order machines.

having no UB at all will kill a lot of optimizations still relevant today (and won't match anymore to hardware as some UB is on hardware level)

out of order machines aren't magically fixing that, just makes some less optimized code work better, but not all

and a lot of low energy/cheap hardware does have no or very very limited out of order capabilities so it's still very relevant and likely will stay very relevant for a very long time

How would you implement integer-to-pointer conversions without UB?
What is UB about integer-to-pointer conversions?
Plenty of things! The resulting pointer may not be pointing to an object that is currently live, for example. It may not even be pointing to an object that makes any sense in the language's object model. It might be pointing to a return address on the stack, for example. Or a constant in the constant pool. Or a saved register in the middle of a computation that corresponds to no variables in the original program.

In short, the moment you enable integer-to-pointer conversions (assuming your target has a flat address space), you create pointer provenance problems whose only resolution is that some things have to be UB.

I don't think any of those are undefined behavior in the strict sense in which the term is defined in the C/C++ standards. Pointer casts are defined behavior. I believe the things you point to are either implementation-defined or unspecified, which is different from UB.

It may seem nitpicky, but the downside of relying on implementation defined or unspecified behavior is largely boxed and contained. E.g you might get a memory access error. UB is, in principle, completely unlimited in downside. And because of that, it often interacts badly with optimization passes, resulting in very strange bugs.

> why primitive types aren't "frozen" by default.

it kills _a lot_ of optimizations leading to problematic perf. degredation

TL;DR: always freezing I/O buffers => yes no issues (in general); freezing all primitives => perf problem

(at lest in practice in theory many might still be possible but with a way higher analysis compute cost (like exponential higher) and potentially needing more high level information (so bad luck C)).

still for I/O buffers of primitive enough types `frozen` is basically always just fine (I also vaguely remember some discussion about some people more involved into rust core development to probably wanting to add some functionality like that, so it might still happen).

To illustrate why frozen I/O buffers are just fin: Some systems do already anyway always (zero or rand) initialize all their I/O buffers. And a lot of systems reuse I/O buffers, they init them once on startup and then just continuously re-use them. And some OS setups do (zero or rand) initialize all OS memory allocations (through that is for the OS granting more memory to your in process memory allocator, not for every lang specific alloc call, and it doesn't remove UB for stack or register values at all (nor for various stations related to heap values either)).

So doing much more "costly" things then just freezing them is pretty much normal for I/O buffers.

Through as mentioned, sometimes things are not frozen undefined on a hardware level (things like every read might return different values). It's a bit of a niche issue you probably won't run into wrt. I/O buffers and I'm not sure how common it is on modern hardware, but still a thing.

But freezing primitives which majorly affect control flows is both making some optimizations impossible and other much harder to compute/check/find, potentially to a point where it's not viable anymore.

This can involve (as in freezing can prevent) some forms of dead code elimination, some forms of inlining+unrolling+const propagation etc.. This is mostly (but not exclusively) for micro optimizations but micro optimizations which sum up and accumulate leading to (potentially but not always) major performance regressions. Frozen also has some subtle interactions with floats and their different NaN values (can be a problem especially wrt. signaling NaNs).

Through I'm wondering if a different C/C++ where arrays of primitives are always treated as frozen (and no signaling NaNs) would have worked just fine without any noticeable perf. drawback. And if so, if rust should adopt this...