|
|
|
|
|
by annilt
2022 days ago
|
|
Just want to add one thing, x86 has stronger memory “semantics”. So, it doesn’t have to work that way behind the scenes, just at the end of the block, it has to appear it worked that way. So, x86 does reordering, store combining etc a lot. IMHO, performance difference between arm vs x86 barely related with ISA, in M1 case, it’s definitely not, a lot more going on than just taking advantage of weaker memory model. |
|
One couldn’t do X86 version of M1, mostly because there is no way of making an instruction decoder that wide for it.
And the performance penalty of M1 when working in TSO mode strongly implies that yes the weaker memory model indeed plays a major role. Not the biggest, but definitely not insignificant. Tens of percents here and tens of percents there combined become a ridiculous perf boost.