Hacker News new | ask | show | jobs
by beeforpork 34 days ago
The UB in unaligned pointers is even worse: an unaligned pointer in itself is UB, not only an access to it. So even implicit casting a void*v to an int*i (like 'i=v' in C or 'f(v)' when f() accepts an int*) is UB if the cast pointer is not aligned to int.

It is important to understand that this is a C level problem: if you have UB in your C program, then your C program is broken, i.e., it is formally invalid and wrong, because it is against the C language spec. UB is not on the HW, it has nothing to do with crashes or faults. That cast from void* to int* most likely corresponds to no code on the HW at all -- types are in C only, not on the HW, so a cast is a reinterpretation at C level -- and no HW will crash on that cast (because there is not even code for it). You may think that an integer value in a register must be fine, right? No, because it's not about pointers actually being integers in registers on your HW, but your C program is broken by definition if the cast pointer is unaligned.

6 comments

Author here.

> an unaligned pointer in itself is UB

Yup. Per the "Actually, it was UB even before that" section in the post.

> UB is not on the HW, it has nothing to do with crashes or faults

Yeah. I tried to convey this too, but I'm also addressing the people who say "but it's demonstrably fine", by giving examples. Because it's not.

Which is totally fine and expected for any decent programmer. Casting pointers is clearly here be dragons territory.
Many, many programmers come to C (and C++) with a lower-level understanding that actually gets in the way here. They understand that all types "are" just bytes and that all pointers "are" just register-sized integer addresses, because that's how the hardware works and has worked for decades.

It's perfectly reasonable to expect any load through `int*` to just load 4 bytes from memory, done and done. They get surprised that it is far from the whole story, and the result is UB.

Meanwhile, the actual computers we have been using for decades have no problems actually just loading 4 bytes through any arbitrary pointer with zero overhead. But no.

> They understand that all types "are" just bytes and that all pointers "are" just register-sized integer addresses, because that's how the hardware works and has worked for decades.

I'd clarify this with "They understand that all values are just bytes".

> Meanwhile, the actual computers we have been using for decades have no problems actually just loading 4 bytes through any arbitrary pointer with zero overhead.

It's partly the standards fault here - rather than saying "We don't know how vendors will implement this, so we shall leave it as implementation-defined", they say "We don't know how vendors will implement this, so we will leave it as undefined".

A clear majority of the UB problems with C could be fixed if the standards committee slowly moved all UB into IB. It's not that there isn't any progress (Signed twos-complement is coming, after all), it's that there is (I believe) much pushback from compiler authors (who dominate the standards) who don't want to make UB into IB.

> A clear majority of the UB problems with C could be fixed if the standards committee slowly moved all UB into IB

There is no such thing as getting rid of "all UB."

What behavior is the implementation supposed to prescribe for a write to an unpredictable garbage address you read from the network? It could overwrite your code. It could overwrite any value anywhere. It could overlap with anything. Prescribing defined behavior for absolutely everything would require defining a precise, unoptimizable 1-to-1 mapping to assembly code and disallowing any multithreading.

> What behavior is the implementation supposed to prescribe for a write to an unpredictable garbage address you read from the network? I

"The compiler is not allowed to elide a write to a garbage address".

Wasn't that easy?

No, because that doesn't mean anything. All the other guarantees the compiler has to make about the other parts of your program have potentially been violated by the write, so the compiler can't guarantee any particular behavior.

Hence, "undefined behavior," of the entire program, not just that particular write.

Turning undefined behavior into implementation defined behavior is rarely a fix, though.
It's a fix that removes the most pointy part of UB.

"Going past the end of the array results in addressing arbitrary values" I can live with. "Going past the end of an array results in anything happening" is a hard sell.

Is that really a meaningful distinction?

Once you are addressing arbitrary values you are firmly in the realm of "anything happening" in practice, but you've now given up optimization opportunities. As has been repeatedly demonstrated over the years, once memory safety breaks it is practically impossible to make any guarantees about program behavior.

I think it’s a really easy sell, actually: if you go past the end of the array far enough you end up accessing the stack which includes parts of the program like “where does this function return to” or “what is the index used to perform this access” or “there is no page mapped there”. None of these are arbitrary values.
Are you talking about creating a pointer (more than one item) past an array, or dereferencing that pointer? Both are currently UB.

For the former, I kinda get it. It may need to be there for cases like with segmented address space where p+10 could actually be a value less than p, for the eventually generated assembly. Maybe it should be fine to create such a pointer, but have it be "indeterminate value" or whatever, if you try to compare that pointer to anything? I don't know enough about compiler internals to say one way or the other.

Dereferencing, though, can only be UB. There may not be a "value" behind that address. There may be a motor that's been I/O mapped, or a self destruct button.

>It's partly the standards fault here - rather than saying "We don't know how vendors will implement this, so we shall leave it as implementation-defined", they say "We don't know how vendors will implement this, so we will leave it as undefined

I'd agree to a point. I still think it's unreasonable for compiler writers to get all lawyery about precise terminology. After all "implementation defined" could still be subject to the same lawyeriness (we implemented it, ergo we define it).

To me this is an issue of culture. We need to push back against the view that UB means anything can happen, therefore the compiler can do anything.

But it's genuinely useful. In all seriousness, are you sure you aren't perhaps just using the wrong language? At this point UB and leveraging it for optimization are core parts of the most performant C implementations.

That said, I think there are many cases where compilers could make a better effort to link UB they're optimizing against to UB that appears in the code as originally authored and emit a diagnostic or even error out. But at least we've got ubsan and friends so it seems like things are within reason if not optimal.

>are you sure you aren't perhaps just using the wrong language

Well I think there is a tension here. C is the language for microcontrollers and the language for high performance.

In ye olden days both groups interests were aligned because speed in C was about working with the machine. Now the UB has been highjacked for speed, that microcontroller that I'm working on, where I know and int will overflow and rely on that is UB so may be optimised out, so I then have to think about what the compiler may do.

I wouldn't say C is the wrong language. I would say there are wrong compilers though.

> At this point UB and leveraging it for optimization are core parts of the most performant C implementations.

I am skeptical that NULL-pointer checks being removed contribute anything more than a rounding error in performance gains in any non-trivial program.

This series was a good explanation for me of why treating UB this way is genuinely useful: https://blog.llvm.org/2011/05/what-every-c-programmer-should...

Being able to assume certain things don't happen is powerful when you're writing optimisations, not doing that would have a real performance cost

> Being able to assume certain things don't happen is powerful when you're writing optimisations, not doing that would have a real performance cost

A few of those are significant performance gains, the majority are not.

Emitting the instruction for a NULL pointer dereference is effectively no more costly than not emitting that instruction.

It's the code removal that's killing me.

Right. But to take the first example, the value of initialised memory.

It's undefined so it doesn't have to be zeroed therefore increasing efficiency.

But it's also UB so if you do know that memory contains something, you can't take advantage of that because it's UB. Having it UB is fine. It's the compilers assuming UB can't happen and optimising it away.

> Meanwhile, the actual computers we have been using for decades have no problems actually just loading 4 bytes through any arbitrary pointer with zero overhead. But no.

Not if those 4 bytes span a cacheline boundary, that will most likely result in 1/2 throughput compared to loading values inside a single cacheline. And if it causes cache-misses it takes up twice the L2 or L3 bandwidth.

Even worse, if the int spans two pages, it will need two TLB lookups. If it's a hot variable and the only thing you use from those pages, it even uses up an additional TLB entry, that could otherwise be used for better perf elsewhere, etc.

And if you're on embedded (and many C programs are), Cortex-M CPUs either can't handle unaligned accesses (M0, M0+) or take 2-3 times as long (split the load into 2x2 byte or 1x2 + 2x1 byte)

I don’t think any of that is justification for making unaligned access UB. It’s reason to avoid it or discourage it in certain scenarios, but it’s infinitesimally rare that loading 8 bytes instead of 4 is even measurable, and that includes embedded.
> that all pointers "are" just register-sized integer addresses

And crucially until DR#260 https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm this was a reasonable guess as to what the pointers are. Probably not a wise guess because it's not how your C compiler worked even then, but a reasonable guess if you didn't think too hard about this.

One way I like to think about this is that all C's types are just the machine integers wearing crap Halloween costumes. Groucho glasses for bool, maybe a Lincoln hat for char, float and double can be bright orange make-up and a long tie. But the pointers are different, because unlike the other types those have provenance.

5 == 5, 'Z' == 'Z', true == true, 1.5f == 1.5f, but whether two pointers are equivalent does not depend solely on their bit pattern in C.

I'm not sure that's right. For instance, the Pentium 4 spec explicitly says unaligned int32 loads take longer. And x86/x64 is very gentle in that regard, other archs would whip you. So an unaligned int access is rightfully treated differently. It should be IB.

Just creating the pointer, though, should not be UB, even though it apparently is. It should not even be IB.

Also, it’s been way more than a decade since Pentium 4 was remotely relevant.
> Meanwhile, the actual computers we have been using for decades have no problems actually just loading 4 bytes through any arbitrary pointer with zero overhead.

PCs yes, but there are many other things C is compiled to for which this is not true.

C isn't a programming language. It's not even portable assembly. It's a vague suggestion of a program that might or might not be feasible to run on a target computer and the compiler and other diagnostic tools are under no obligation whatsoever to help you find out what, if anything, is wrong with your program. It's user hostile and should be relegated to the bad old days.
Except ARM32. ARM64 doesn't guarantee it to be valid in all cases either.
Yes but casting pointers is virtually required in any non-trivial C program, and frankly even a lot of the trivial ones, because there's no other way to do type erasure or generics. Well, there kind of are now, and there's always been macros, but void * has historically been the predominant way this is done at runtime.
>an unaligned pointer in itself is UB, not only an access to it.

Can someone point to where the standard states this?

I think this is 6.3.2.3.7 in C99 about casting between pointer types:

> If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined.

However, unless I’m missing something, producing such a pointer from an integer is apparently not insta-UB? 6.3.2.2.5:

> An integer may be converted to any pointer type. Except as previously specified, the result is implementation-defined, might not be correctly aligned, might not point to an entity of the referenced type, and might be a trap representation

And later on 6.5.3.2.4:

> If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.

Which implies that the invalid pointer must have been obtained without being already undefined, right?

The problem with C UBI is that originally it meant the compiler has the freedom to map your code to the hardware inspite of machine instructions differing slightly between one another. The same C program may express different behaviour depending on which architecture it is running on.

This type of UB is fine and nobody really complains about hardware differences leading to bugs.

However, over time aggressive readings of UB evolved C into an implicit "Design by Contract" language where the constraints have become invisible. This creates a similar problem to RAII, where the implicit destructor calls are invisible.

When you dereference a pointer in C, the compiler adds an implicit non-nullable constraint to the function signature. When you pass in a possibly nullable pointer into the function, rather than seeing an error that there is no check or assertion, the compiler silently propagates the non-nullable constraint onto the pointer. When the compiler has proven the constraints to be invalid, it marks the function as unreachable. Calls to unreachable functions make the calling function unreachable as well.

> The problem with C UBI is that originally it meant the compiler has the freedom to map your code to the hardware inspite of machine instructions differing slightly between one another. The same C program may express different behaviour depending on which architecture it is running on.

You're conflating undefined behavior with implementation-defined behavior. If it was only to do with what we think of as normal variance between processors, then it would be easy to make it implementation-defined behavior instead.

The differentiating factor of undefined behavior is that there are no constraints on program behavior at that point, and it was introduced to handle cases where processor or compiler behavior cannot be meaningfully constrained. One key class is of course hardware traps: in the presence of compiler optimizations, it is effectively impossible to make any guarantees about program state at the time of a trap (Java tried, and most people agreed they failed); but even without optimizations, there are processors that cannot deliver a trap at a precise point of execution and thus will continue to execute instructions after a trapping instruction.

Does that mean that if I have a struct with #pragma pack(push, 1) I can't use pointers to any members that don't happen to be aligned?
This is a non-standard extension, so your compiler may provide stronger guarantees.
In practice, both GCC and Clang consider pointers to these unaligned members to be UB and will flag them in UBSAN.
But that seems obvious. You can't load an integer from an unaligned address.

It's not only C-level is it. There's no (guarantee across architectures for) machine code for that either.

> You can't load an integer from an unaligned address.

You can, and the results are machine specific, clearly defined and well-documented. Ancient ARM raises an exception, modern ARM and x86 can do it with a performance penalty. It's only the C or C++ layer that is allowed to translate the code into arbitrary garbage, not the CPU.

There’s usually not a performance penalty on modern hardware
There's typically only a performance penalty if the unaligned load spans a cache line on modern hardware.
Sure you can. In many architectures it works just fine. Works perfectly in x86_64, for example. It's just a little slower.
In many architectures does not mean you can. The standard is supposed to cover all architectures.
If some architecture traps on unaligned access, then the compiler can and should simply generate the correct code so that it loads the integer piece by piece instead. Load multiple integers and shift and mask away the irrelevant bits, done. This is exactly what modern architectures already do in hardware. Works, it's just a little slower.

This is exactly what the compilers do if you use a packed structure to access unaligned data. Works everywhere, as expected. Compilers have always known what to do, they just weren't doing it. C standard says no.

The fact is the standard is garbage and the first thing every C programmer should learn is that they can and should ignore it. There is never any reason to wonder what the standard is supposed to do. The only thing that matters is what compilers actually do.

But if it's a pointer, the compiler doesn't know the alignment at compile time. Should the compiler insert an alignment check of every pointer access?
Compilers could add support for an unaligned attribute that we can apply to pointers. I'd prefer that to wrapping everything in a packed structure which is quite unsightly.

Would have been better if correct behavior was the default while pointer alignment requirements were opt in, just like vector stuff. Nothing we can do about it now.

I would hope the compiler is smart enough to figure out which accesses are aligned and unaligned on its own.

> If some architecture traps on unaligned access, then the compiler can and should simply generate the correct code so that it loads the integer piece by piece instead.

Wouldn't the compiler have to assume that every pointer access might be unaligned and do the slow "piece by piece" access every time? It can hardly guess the runtime value of a pointer during compilation.

It should be able to make a lot of inferences. For example, taking the address of some value allocated by the compiler itself results in an aligned pointer unless the programmer overrides it. Compiler should be able to trace it from there. Pointers from malloc are also aligned.

If compiler is not doing it for some reason, __builtin_assume_aligned can be used to explicitly mark a pointer as aligned.

The pointer might be something you forced. The compiler needs to do the right thing but if you set the pointer to an unaligned address because you have information on the hardware you can get this undefined situation with nothing the compiler can do about it.
Any reason the hardware pointer can't be accessed via the packed structure?

https://news.ycombinator.com/item?id=48205371

> If some architecture traps on unaligned access, then the compiler can and should simply generate the correct code so that it loads the integer piece by piece instead.

LMAO what?!

The compiler should pessimize each and every memory access everywhere with an alignment check on the pointer and a branch, or forego the efficient memory access method of the platform entirely and just do bytewise loads only?!

Unaligned access. Not every access. Compiler should be able to analyze code, determine alignment invariants and optimize everything it can. If not, __builtin_assume_aligned could help whenever it needs to be made explicit. Alignment should have been part of the type itself to begin with but there's no fixing that now.
That's why we write C instead of assembly, isn't it?

You could also mandate that a compiler for architectures without unaligned access either has to prove that the access is going to be aligned or insert a wrapper to turn the unaligned access into two aligned ones.

Just pretending the issue doesn't exist at all and making it the programmer's problem by leaving it as UB in the spec is a choice.

Unless your code targets some exotic architecture, like idk x86.
Not really. Wait until the compiler starts vectorizing your code and using instructions requiring alignment (like the ones with A or NT in the mnemonic).
Usually the compiler will probably not generate those
> Usually...probably...

you're betting against the compiler ever improving.

This would be a regression
You missed the point: the pointer existing as a value of that type at all is UB, even if you never try to access anything through it and no corresponding machine code is ever emitted.
Yes? I agree with that. I don't really see the issue there. The computer will allocate data in aligned addresses, so you would have to be doing something weird to begin with to access unaligned pointers. And aligned access is always better anyway. I guess packed structs are a thing if you're really byte golfing. Maybe compressed network data would also make sense.

But then I would assume you are aware of unaligned pointers, and have a sane way to parse that data, rather than read individual parts of it from a raw pointer.

I am curious, what would be a legitimate reason for an unaligned pointer to int?

String search algorithms would be one example, where a 64-bit register can be used as a “vector” containing 8x1 bytes.
Where is the part about unaligned pointers?
Strings typically consist of UTF-8 bytes, and any old `char*` pair has no alignment guarantees.