|
|
|
|
|
by magduf
2368 days ago
|
|
x86 is a lousy architecture, but x86-64 isn't as bad; at least it has a good number of registers unlike x86. It's easy to prove x86 is lousy by seeing its success in embedded and mobile devices: it hasn't had any, despite trying (Atom). ARM reigns supreme here. x86 manages to hold on because of 1) backwards compatibility with proprietary software (namely Windows), and 2) inertia. We don't really notice it that much because we've covered over it with abstraction layers: almost no one writes assembly code any more. We would be better off if we switched to something better, but we wouldn't notice it much; we might see a slight amount of power savings, and OS developers would be happier. But the benefits just aren't worth the costs. It's unfortunate, though: the major chipmakers should be able to just develop a nice, clean-sheet architecture which retains the good parts of x86 (like the PCIe bus and enumeration, things missing on ARM) and cleans up the bad things, and users shouldn't have to do anything other than make sure to select the Linux distro that matches that architecture. The presence of binary-only proprietary software that only works on x86(/64) is probably the biggest reason we're stuck with it: a competing CPU maker can't just come up with something new and have it "just work" (after getting their compiler changes merged into GCC and LLVM of course). |
|
Funny thing.
Whether you boot a modern x86 system in 32 bit mode or 64 bit mode doesn't change the number of registers you're using. You're still using the 32-128 physical registers on the core. That's why x86_64 code isn't particularly faster (sometimes slower) than the same code compiled for x86_32 mode.
People have this zany idea that assembly language is a low level language. It's not. When the CPU executes 32 bit x86 code it "compiles" it to uops that are totally unrecognizable to us and use dozens to hundreds of registers.
The thing about embedded kinda exposes your mental bias. When you scale up the x86 instruction decoder from Intel Atom scale to Xeon scale, the instruction decoder gets a little bit more complicated, but it's given a ton more tools to use. So sure, Atom sucks at embedded, but x86 is still king at desktop and beyond, and ARM will never be able to challenge it.
If ARM wanted to challenge x86 in terms of single threaded performance, it would need to do the same thing x86 does: have a super complicated instruction decoder that maps the 32 logical registers defined by the ISA to its 128 physical ones, reschedules everything, renames stuff where appropriate, identify loads that can be elided, etc. And all of the advantages of having a simple ISA go out the window, because the ISA is an illusion.
Unfortunately Intel's Architecture Code Analyzer is dead. LLVM MCA is almost as good. I recommend you play around with it a bit some time. CPUs kinda don't give a crap if you're speculating four loop iterations ahead and you're just using the same few registers over and over again for multiple purposes.