Hacker News new | ask | show | jobs
by annilt 2022 days ago
Just want to add one thing, x86 has stronger memory “semantics”. So, it doesn’t have to work that way behind the scenes, just at the end of the block, it has to appear it worked that way. So, x86 does reordering, store combining etc a lot. IMHO, performance difference between arm vs x86 barely related with ISA, in M1 case, it’s definitely not, a lot more going on than just taking advantage of weaker memory model.
1 comments

Having to appear worked that way does cause restrictions in multiprocessor case. ARM chips naturally do all of that too, with the memory model simply giving them way more freedom to reorder things.

One couldn’t do X86 version of M1, mostly because there is no way of making an instruction decoder that wide for it.

And the performance penalty of M1 when working in TSO mode strongly implies that yes the weaker memory model indeed plays a major role. Not the biggest, but definitely not insignificant. Tens of percents here and tens of percents there combined become a ridiculous perf boost.