Hacker News new | ask | show | jobs
by wrs 393 days ago
That assumes the compiler reserves one continuous place for the value, which isn’t always true (hardly ever true in the case of registers). If the compiler is required to make all code paths result in the same uninitialized value, that can limit code generation options, which might reduce performance (and performance is the whole reason to use uninitialized values!).

Also, an uninitialized value might be in a memory page that gets reclaimed and then mapped in again, in which case (because it hasn’t been written to) the OS doesn’t guarantee it will have the same value the second time. There was recently a bug discovered in one of the few algorithms that uses uninitialized values, because of this effect.

3 comments

> same uninitialized value, that can limit code generation options

it pretty much requires the compiler to initialize all values when they first "appear"

except that this is impossible and outright hazardous if pointers are involved

But doable for a small subset like e.g.

- stack values (but would inhibit optimizations, potentially pretty badly)

- some allocations e.g. I/O buffers, (except C alloc has no idea that you are allocating an I/O buffer)

> If the compiler is required to make all code paths result in the same uninitialized value, that can limit code generation options

Can you provide (on say x86_64) an example of this, other than the case where the compiler prunes cases based on characterizing certain paths as UB? In other words, a case where "an uninitialized value is well-defined but can be different on each read" allows more performance optimization than "the value will be the same on each read".

> Also, an uninitialized value might be in a memory page that gets reclaimed and then mapped in again, in which case (because it hasn’t been written to) the OS doesn’t guarantee it will have the same value the second time. There was recently a bug discovered in one of the few algorithms that uses uninitialized values, because of this effect.

This does not sound correct to me, at least for Linux (assuming one isn't directly requesting such behavior with madvise or something). Do you have more information?

The most obvious general case (to me) is reading an uninitialized local variable in a loop. If uninitialized has to be the same value every time, you’d have to allocate a register or stack space to ensure the value was the same on every iteration. Instead, you’d don’t have to allocate anything, just use whatever value is in any register that’s handy. (By this logic you can also start pruning code, by picking the “most optimal” value for the uninitialized variable.)
I can’t find a citation, but my recollection is the problem happened with the Briggs-Torczon sparse set algorithm, which relies on uninitialized memory not changing. For performance, they were using MMAP_UNINITIALIZED (which has to be enabled with a kernel config).
But, I wonder how much it would reduce performance, if we only have to pick a value the first time the memory is read?

I would imagine there isn't that many cases where we are reading uninitalised memory and counting on that reading not saving a value. It would happen when reading in 8-byte blocks for alignment, but does it happen that much elsewhere?

if you pick a value you have to store it, and if you have to store it it might spill into memory when register allocation fails. Moving from register-only to stack/heap usage easily slows down your program by an order of magnitude or two. If this is in a hot path, which I'd argue it is since using uninitialized values seems senseless otherwise, it might have a big impact.

The only way to really know is to test this. Compilers and their optimizations depend on a lot of things. Even the order and layout of instructions can matter due to the instruction cache. You can always go and make the guarantee later on, but undoing it would be impossible.