| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tovej 24 days ago
	But that seems obvious. You can't load an integer from an unaligned address. It's not only C-level is it. There's no (guarantee across architectures for) machine code for that either.

4 comments

codeflo 24 days ago

> You can't load an integer from an unaligned address.

You can, and the results are machine specific, clearly defined and well-documented. Ancient ARM raises an exception, modern ARM and x86 can do it with a performance penalty. It's only the C or C++ layer that is allowed to translate the code into arbitrary garbage, not the CPU.

saagarjha 24 days ago

There’s usually not a performance penalty on modern hardware

orlp 24 days ago

There's typically only a performance penalty if the unaligned load spans a cache line on modern hardware.

matheusmoreira 24 days ago

Sure you can. In many architectures it works just fine. Works perfectly in x86_64, for example. It's just a little slower.

tovej 24 days ago

In many architectures does not mean you can. The standard is supposed to cover all architectures.

matheusmoreira 24 days ago

If some architecture traps on unaligned access, then the compiler can and should simply generate the correct code so that it loads the integer piece by piece instead. Load multiple integers and shift and mask away the irrelevant bits, done. This is exactly what modern architectures already do in hardware. Works, it's just a little slower.

This is exactly what the compilers do if you use a packed structure to access unaligned data. Works everywhere, as expected. Compilers have always known what to do, they just weren't doing it. C standard says no.

The fact is the standard is garbage and the first thing every C programmer should learn is that they can and should ignore it. There is never any reason to wonder what the standard is supposed to do. The only thing that matters is what compilers actually do.

da-alex 24 days ago

But if it's a pointer, the compiler doesn't know the alignment at compile time. Should the compiler insert an alignment check of every pointer access?

matheusmoreira 24 days ago

Compilers could add support for an unaligned attribute that we can apply to pointers. I'd prefer that to wrapping everything in a packed structure which is quite unsightly.

Would have been better if correct behavior was the default while pointer alignment requirements were opt in, just like vector stuff. Nothing we can do about it now.

I would hope the compiler is smart enough to figure out which accesses are aligned and unaligned on its own.

josefx 24 days ago

> If some architecture traps on unaligned access, then the compiler can and should simply generate the correct code so that it loads the integer piece by piece instead.

Wouldn't the compiler have to assume that every pointer access might be unaligned and do the slow "piece by piece" access every time? It can hardly guess the runtime value of a pointer during compilation.

matheusmoreira 23 days ago

It should be able to make a lot of inferences. For example, taking the address of some value allocated by the compiler itself results in an aligned pointer unless the programmer overrides it. Compiler should be able to trace it from there. Pointers from malloc are also aligned.

If compiler is not doing it for some reason, __builtin_assume_aligned can be used to explicitly mark a pointer as aligned.

bluGill 24 days ago

The pointer might be something you forced. The compiler needs to do the right thing but if you set the pointer to an unaligned address because you have information on the hardware you can get this undefined situation with nothing the compiler can do about it.

matheusmoreira 24 days ago

Any reason the hardware pointer can't be accessed via the packed structure?

https://news.ycombinator.com/item?id=48205371

saagarjha 24 days ago

The same reason you probably aren’t adding manual alignment fixes to your code?

bluGill 24 days ago

however you certainly can do that. The point of unaligned is the hardware can't load it from a single memory location in one address. It needs two accesses. And in that time, the value of one of the two addresses that the hardware has to load can change.

I would hope you're not so stupid as to design hardware that relies on this, but the fact is it certainly is possible for someone to do that. And if you do that, there is nothing that the compiler or the standard can do. It can't be done correctly

mike_hock 24 days ago

> If some architecture traps on unaligned access, then the compiler can and should simply generate the correct code so that it loads the integer piece by piece instead.

LMAO what?!

The compiler should pessimize each and every memory access everywhere with an alignment check on the pointer and a branch, or forego the efficient memory access method of the platform entirely and just do bytewise loads only?!

matheusmoreira 23 days ago

Unaligned access. Not every access. Compiler should be able to analyze code, determine alignment invariants and optimize everything it can. If not, __builtin_assume_aligned could help whenever it needs to be made explicit. Alignment should have been part of the type itself to begin with but there's no fixing that now.

mike_hock 20 days ago

So yes, pessimize each and every access. No, that's not acceptable. And no, just because the compiler can get rid of some of the alignment checks where static analysis can prove that the pointer is aligned doesn't cut it.

Yes, making alignment part of the type system would be the correct fix. And yes, that's absolutely still possible since unaligned access is still UB. You're not breaking existing code. You could easily add new pointer types with (static) alignment information.

crote 24 days ago

That's why we write C instead of assembly, isn't it?

You could also mandate that a compiler for architectures without unaligned access either has to prove that the access is going to be aligned or insert a wrapper to turn the unaligned access into two aligned ones.

Just pretending the issue doesn't exist at all and making it the programmer's problem by leaving it as UB in the spec is a choice.

mbel 24 days ago

Unless your code targets some exotic architecture, like idk x86.

cataphract 24 days ago

Not really. Wait until the compiler starts vectorizing your code and using instructions requiring alignment (like the ones with A or NT in the mnemonic).

saagarjha 24 days ago

Usually the compiler will probably not generate those

bigfishrunning 24 days ago

> Usually...probably...

you're betting against the compiler ever improving.

saagarjha 24 days ago

This would be a regression

bigfishrunning 24 days ago

Why? Automatic vectorization is pretty bad and has been for years, but wouldn't it be nice if the compiler could unroll-loops and use SIMD instructions to make your code faster while also being correct?

pjc50 24 days ago

You missed the point: the pointer existing as a value of that type at all is UB, even if you never try to access anything through it and no corresponding machine code is ever emitted.

tovej 24 days ago

Yes? I agree with that. I don't really see the issue there. The computer will allocate data in aligned addresses, so you would have to be doing something weird to begin with to access unaligned pointers. And aligned access is always better anyway. I guess packed structs are a thing if you're really byte golfing. Maybe compressed network data would also make sense.

But then I would assume you are aware of unaligned pointers, and have a sane way to parse that data, rather than read individual parts of it from a raw pointer.

I am curious, what would be a legitimate reason for an unaligned pointer to int?

simonask 24 days ago

String search algorithms would be one example, where a 64-bit register can be used as a “vector” containing 8x1 bytes.

jstimpfle 24 days ago

Where is the part about unaligned pointers?

simonask 24 days ago

Strings typically consist of UTF-8 bytes, and any old `char*` pair has no alignment guarantees.

jstimpfle 24 days ago

That's true, and that's why your typical string vector code has a prelude and a postlude to do the incomplete chunks at the ends. Between the ends, it's processing larger self-aligned chunks.