Hacker News new | ask | show | jobs
by _bxg1 2355 days ago
People have complained a lot about this decision by Apple, but I respect it. They're using their leverage over the ecosystem to cut off a huge piece of cruft from not only their own codebase, but many other codebases like Rust's that can now point to their decision in the face of criticism. And the only cost is that old binaries (not old code, old binaries) will no longer run without being rebuilt. Plus, this will probably never need to happen again because 64 bits can address 18 million terabytes of main memory.
2 comments

> And the only cost is that old binaries (not old code, old binaries) will no longer run without being rebuilt.

Not always true. There's plenty of 32bit code that can't trivially be ported over to 64bit, even if you have the full source for all the dependencies. e.g. maybe the file formats have padding or pointer width assumptions - now you need a compatibility shim whenever loading/saving the files. Maybe the code has implicit padding assumptions, which are not documented anywhere - now you'll get occasional random functionality bugs that are a huge effort to catch and debug.

It is not always a matter of flipping a compile-time switch for the developers, even if they are still around, and have all the source, and all the libraries and whatnot.

I feel embarrassed asking this, but better that than not to. Is memory address space literally the _only_ difference between 32 and 64 bit architectures?
Not neccessarily. While the size of pointers and the size of general purpose registers does change between 32 and 64-bit architectures, there can be other significant differences.

For example, on X86_64, the ISA provides additional GPRs that aren't just 64-bit versions of the 32-bit register. For example, rax is the 64-bit version of eax, but IIRC you also have r0-7, for which there is no 32-bit equivalent. Furthermore, the ABI (at least on Windows) specifies a different function calling convention versus X86.

Additionally, I imagine there additional instructions/ops on x86_64 over x86. I dont know how else the cpu would distinguish add %eax, 1 from add %rax, 1, both of which are legal when the cpu is running in 64-bit mode.

I recall seeing a cool talk on how to confuse or crash many debuggers by doing something clever in assembly. The idea is you would write a block of polyglot 32 and 64-bit x86/64 assembly (i.e. binary that is both a valid x86 and x86_64 instruction sequence), switch the cpu from 32 to 64 bit mode at the end of the sequence, then branch back to the start of the block and reinterpret the same instructions as 64-bit rather than 32. You could use this technique to frustrate reverse engineering.

But pretty much all of those differences get handled at the compiler level, right?
If you thought about portability when you were writing the code, then yes, you just flip a compiler switch and you're good to go.

A lot of the older 32bit software didn't ever consider the need to switch to 64bit in the future, so there is plenty of implicit assumptions made. These are sometimes very obvious (e.g. hand coded 32bit assembly), but sometimes very hard to detect, and can have a lot of logic built on top of it. For example, consider these two structures:

    struct A { int n; void *p; float f; };
    struct B { int a, b, c; };
Both structures are 12 byte large on 32bit, but A is 16 bytes on 64bit, because the pointer will be naturally aligned to 64bit boundary, so it will have 4 bytes of padding after n.

There's plenty of ways the code could assume that sizeof(A) == sizeof(B). For example, it could use memcmp() to verify the structures are equal, or it could be allocating them from the same slab allocator to minimize fragmentation. These kinds of bugs will not be caught by the compiler, and they might not be easily noticeable at runtime. It takes significant effort to port a, say, 20 year old codebase written for 32bit over to 64bit.

Yes, it's easy to say "that's bad code, you shouldn't have written it like that in the first place", but programming 20 years ago was drastically different than today, and those "hacks" could make or break a product back then. And, at the time some 32bit software was written, it was not even obvious what 64bit would look like, so it was not obvious how you'd prepare for it even if you wanted to.

I'm not an expert, but technically what it defines is the size of a "word", which is a basic unit of memory handled by the processor, used for - among other things - addresses. I think this might affect number precision too but I might be wrong. But regardless, most higher-level languages don't care about word size. Even C/++ code can be written such that it doesn't care about word size, though it can also be written such that it does.