| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by barrkel 6007 days ago
	The stack is not actually aligned on function entry, because the return address is on top, so more alignment will be needed to avoid SSE2 locals being misaligned. It's not so hard for the callee side of the ABI to make sure the stack is aligned if it's going to use SSE2 and friends; it's rather more onerous to require every call site to make the alignments for the benefit of the callee.

1 comments

tentonova2 6007 days ago

The stack is not actually aligned on function entry, because the return address is on top, so more alignment will be needed to avoid SSE2 locals being misaligned.

The stack has --known alignment-- on entry, which removes the need to compute alignment at runtime. Any other approach requires more instructions overall.

It's not so hard for the callee side of the ABI to make sure the stack is aligned if it's going to use SSE2 and friends; it's rather more onerous to require every call site to make the alignments for the benefit of the callee.

I disagree that it's onerous. It seems silly to increase the runtime costs in exchange for a minutely simplified compiler port. It's not as if non-4-byte aligned ABIs are unusual.

link

barrkel 6007 days ago

But instead of aligning the stack in one location, the callee, now it needs to be aligned everywhere. It's pretty probable that's more instructions everywhere.

And it's not a "minutely simplified compiler port". That statement is startlingly naive. Do you have any idea how much hand-coded inline assembly, both in the runtime library and in customer code, needs to be carefully reviewed and modified to port from a platform without this requirement to one with it? Particularly since almost every other platform targeting the same architecture doesn't have the requirement?

link

tentonova2 6007 days ago

But instead of aligning the stack in one location, the callee, now it needs to be aligned everywhere. It's pretty probable that's more instructions everywhere.

SSE2 is used everywhere. That's unlikely.

Do you have any idea what the advantages are of being able to use SSE2+ everywhere? I find your position to be startling naive, especially given the fact that the vast majority of the existing Mac OS X developer base did not have any hand-coded inline assembly targeted at x86-32.

Other than game developers, how many legacy x86-32 developers is Apple genuinely interested in courting? Even for game developers (or JIT authors, or otherwise) with an overabundance of x86 4-byte-alignment-assuming assembly, fixing stack alignment is an annoying issue, not an impossible one.

link

barrkel 6007 days ago

Ah yes, Apple doesn't want any more developers for its platform. I forgot about that.

link

tentonova2 6007 days ago

No, Apple made a perfectly sane business and technical decision to optimize for their users and existing developer base rather than a small subset of the non-Apple developer base who would have issue with 16-byte stack alignment.

The reasoning makes sense and I'd have done the same. I fixed our code and moved on.

link

barrkel 6007 days ago

The vote is clearly in, and the majority is siding with Apple. I haven't changed my position, though, and the more you write in this thread, the more convinced I am that you don't know what you're talking about. The technical reasons are not strong; SSE2 is primarily for floating point ops and SIMD vectorized ops. Most user code does not use floating point, and it's hard for compilers to extract latent parallelism to produce vectorized code.

However, if you put the technical reasons aside, and only focus on the business reasons for making a choice here, it's clear to me that the best way to go is to fall in line with the existing precedents for the platform. That way maximizes your business upside. There is no business reason for wanting 16-byte alignment, only business reasons for not wanting it.

The technical case would need to outweigh the business case in order for it to win. But I don't see the technical case as being that strong. Floating point code is rare. Outside of scalable vector-oriented UI libraries and domain-specific number crunching, it's hardly ever used. Many architectures survived for decades with only optional support for floating point, in a coprocessor. Many embedded architectures still use emulated FP, if it's needed at all.

And I really meant it about shockingly naive back there. It tells me everything I need to know about what you know about commercial compilers: that you think of them in the academic sense of being the bit that turns text into code. There's more to it than that in the real world.

link