Hacker News new | ask | show | jobs
by dansalvato 854 days ago
I often see sentiment that x86 architecture itself is fundamentally a bottleneck, compared to ARM/RISC, when it comes to efficiency and performance per watt. Does someone with the right expertise have insight to share on this? I'm curious what the factors are here. I would imagine that a factor is that most ARM implementations we see (like from Qualcomm, Nvidia, and Apple) are full-on SoCs which will naturally have efficiency benefits. But I'd love to learn more about this before "taking sides" and declaring that so-and-so architecture is the future, or whatever.
1 comments

X86 and ARM are practically the same these days in performance.

Let me explain RISC vs CISC, RISC is Reduced Instruction Set Computer and CISC is Complex Instruction Set Computer. The base component instruction sets of x86 are CISC. The base component instruction sets of ARM are RISC.

When technology evolves and newer instruction sets are required to handle those tasks they can often become more complex and so today with the variety of instruction sets on both x86 and ARM they are closer to each other more than ever. Still different.

Now going to the differences between ARM and x86 where it matters. ARM has the lowest power draw to performance ratio but it's need for power skyrockets as it approaches more complicated tasks. x86 starts higher in power draw but its performance is pretty maintained under all workloads.

Neither are wrong, just it depends what you're planning to do. x86 is probably the best architecture for the general user, but ARM is great in a phone if everything stays relatively simple. Notice how some phones with 4000 mAh battery will be dead in 30mins while playing a game which the Steam Deck can handle for maybe 1.5 hours of gameplay? However if I wanted something to stay on all day to receive text notifications and pretty much be idle in my pocket, I'd want an ARM processor. I think the real reason Apple went ARM is that after studying the life habits of their customers they realized most of their laptop users fold sleep their devices like a phone without ever really charging them. At least the majority, when working in a Mac Shop, the term used for a mac only software development team, the constant amount of fold close open recharge would've been better on an ARM processor. Someone at work even asked if they could just code on their Samsung phone because it got better battery life than the old Intel Macs.

Isn't there a problem with the parallel decoding of x86 streams due to all the different instruction lengths? I've read that going much beyond 4 parallel decodes in x86 gets increasingly hard, requiring expensive combinatorial logic. Meanwhile ARM instruction decoding can trivially be parallelized as much as you want.

Other than this I am not familiar with any other fundamental limits to making x86 as efficient as ARM or ARM as fast as x86.

You don't need expensive combinatorial logic. You just use a predictor to decide where the instruction boundaries are. This is the same strategy CPUs have used for branch predictors. Now your performance merely depends on the accuracy of the branch predictors and the prevalence of difficult to predict instruction sequences. That is hardly a show stopper for the high end and you have to remember that compilers try to optimize their code, so there is no reason for them to produce slow code on purpose.
Sorta, this is actually a CPU cache thing, ARM can do it efficiently not needing a lot of CPU cache to handle parrellel decoding. x86 requires more cache to do so. However more cache has its benefits not just in this task. Cache is also getting cheaper.
That still implies both more logic and more "hot" silicon, so decoding is higher overhead.

I recall reading about creating a subset of x86_64 that would be faster to decode, but this would effectively be a different architecture so at that point you might as well go to ARM64 or RISC-V.

I do know that if the instruction set decodes efficiently and is compact (to reduce memory bandwidth) it really doesn't matter much beyond that.

>I do know that if the instruction set decodes efficiently and is compact (to reduce memory bandwidth) it really doesn't matter much beyond that.

RISC-V is also simple, and that's relative to ARM64, nevermind x86.

I.e. it is achieving highly competitive code density and instruction count despite being simpler.

It doesn't matter though technology is ever evolving. More cache will eventually be the norm on chips. Wide lanes for threads too.

M1 has four times the bit width of an AMD Ryzen processor. Supposedly next generation of Ryzen processors the Zen 5 will have a wider bit width.

I don't think any of this is accurate.

For starters, the CISC vs RISC debate has been dead for decades now. Often considered RISC architectures like ARM, have had vector instructions and branch predictors for a long time now.

> Now going to the differences between ARM and x86 where it matters. ARM has the lowest power draw to performance ratio but it's need for power skyrockets as it approaches more complicated tasks. x86 starts higher in power draw but its performance is pretty maintained under all workloads.

These two sentences are contradictory!

That's not the point, the point is complexity of both instruction sets are not too far off.

Initial power draw on ARM is lower but jumps in complicated tasks.

Initial power draw on x86 is higher but maintains in complicated tasks.

> That's not the point, the point is complexity of both instruction sets are not too far off.

So you agree that CISC vs RISC is not a thing nowadays.

> Initial power draw on ARM is lower but jumps in complicated tasks.

> Initial power draw on x86 is higher but maintains in complicated tasks.

What's a "complicated" task? What's the power draw baseline? What are the examples?

Otherwise, these sentences mean nothing.

> So you agree that CISC vs RISC is not a thing nowadays.

Correct.

> What's a "complicated" task? What's the power draw baseline? What are the examples?

Complicated tasks typically involve the use of numerous instruction sets working together to complete a task like with video games that have physics and AI. Exclude AI co-processors for this example. Or even burdening the system with tons of multi-tasking. ARM succombs.

AMD's x86 Ryzen chips rival M series processors, but under stress can do more. M series is the pinnacle of ARM, you won't find anything ARM near it in any way.

> Or even burdening the system with tons of multi-tasking. ARM succombs.

This is clearly untrue, and you can tell it to my laptop running ~500 processes right now.

> AMD's x86 Ryzen chips rival M series processors, but under stress can do more. M series is the pinnacle of ARM, you won't find anything ARM near it in any way.

"Can do more", of what? What are your metrics, other than what appears to be a gut feeling?

ARM64 and AMD64 do not have any significant differences in complexity. In 2024 they are both massively CISC, and most significant idiosyncrasies of both were ironed out in the 64 bit transition. They both use translation layers that a miniscule fraction of the transistor budget. ARM uses a more relaxed memory model which can give it a slight advantage, but there are enough trade-offs there it's not a pure win either.

IOW, in 2024 there's no significant advantage in ISA choice.

There are dozens of factors that matter more than ISA choice -- process node, design team competence, transistor budget, design goals, memory bandwidth, et cetera.

Your comment would be much more accurate if you said "Zen4" and "M1" instead of x86 and ARM. (I chose those two because they're on the same process). Zen4 is better than M1 in some metrics and M1 is better in others. But that's mostly because they had different design goals.

> Notice how some phones with 4000 mAh battery will be dead in 30mins while playing a game which the Steam Deck can handle for maybe 1.5 hours of gameplay? However if I wanted something to stay on all day to receive text notifications and pretty much be idle in my pocket, I'd want an ARM processor.

Untrue, one can easily play GTA Trilogy or Genshin Impact on an iPhone for ~4 hours. But even if it was, a phone battery is rated, on average, between 12Wh and 20Wh, whereas the Steam Deck is 40Wh.

> I think the real reason Apple went ARM is that after studying the life habits of their customers they realized most of their laptop users fold sleep their devices like a phone without ever really charging them.

Nonsense, suspend to RAM has been a staple in laptops for decades now.

> Someone at work even asked if they could just code on their Samsung phone because it got better battery life than the old Intel Macs.

Wow, just wow.

GTA Trilogy 4H on an iPhone. I really doubt even the Max variant at lowest brightness can do that. Generational improvements has been mediocre and I know for a fact that if you get 2H runtime with a moderate game (like Risk) it's already pretty good.

You are a funny guy.

> I know for a fact that if you get 2H runtime with a moderate game (like Risk) it's already pretty good.

That's not my experience at all, but alright.