Hacker News new | ask | show | jobs
by api 854 days ago
Isn't there a problem with the parallel decoding of x86 streams due to all the different instruction lengths? I've read that going much beyond 4 parallel decodes in x86 gets increasingly hard, requiring expensive combinatorial logic. Meanwhile ARM instruction decoding can trivially be parallelized as much as you want.

Other than this I am not familiar with any other fundamental limits to making x86 as efficient as ARM or ARM as fast as x86.

2 comments

You don't need expensive combinatorial logic. You just use a predictor to decide where the instruction boundaries are. This is the same strategy CPUs have used for branch predictors. Now your performance merely depends on the accuracy of the branch predictors and the prevalence of difficult to predict instruction sequences. That is hardly a show stopper for the high end and you have to remember that compilers try to optimize their code, so there is no reason for them to produce slow code on purpose.
Sorta, this is actually a CPU cache thing, ARM can do it efficiently not needing a lot of CPU cache to handle parrellel decoding. x86 requires more cache to do so. However more cache has its benefits not just in this task. Cache is also getting cheaper.
That still implies both more logic and more "hot" silicon, so decoding is higher overhead.

I recall reading about creating a subset of x86_64 that would be faster to decode, but this would effectively be a different architecture so at that point you might as well go to ARM64 or RISC-V.

I do know that if the instruction set decodes efficiently and is compact (to reduce memory bandwidth) it really doesn't matter much beyond that.

>I do know that if the instruction set decodes efficiently and is compact (to reduce memory bandwidth) it really doesn't matter much beyond that.

RISC-V is also simple, and that's relative to ARM64, nevermind x86.

I.e. it is achieving highly competitive code density and instruction count despite being simpler.

It doesn't matter though technology is ever evolving. More cache will eventually be the norm on chips. Wide lanes for threads too.

M1 has four times the bit width of an AMD Ryzen processor. Supposedly next generation of Ryzen processors the Zen 5 will have a wider bit width.