Hacker News new | ask | show | jobs
by dirtypersian 2034 days ago
I didn't really understand the TSO explanation given in this article and found it to be a bit hand-wavy. The article says to emulate the x86 TSO consistency model on an ARM machine which is weakly ordered you have to add a bunch of instructions which would make the emulation slow. I followed that much but then after that it doesn't really explain how they would get around these extra instructions needed to guarantee the ordering. It just says "oh, it's a hardware toggle"; toggle of what exactly?

I could see them just saying no to following TSO for single core stuff and when running emulated code for single core performance benchmarks since technically you don't care about ordering for single core operation/correctness. That would speed up their single core stuff but then what about the multi-core.

1 comments

> It just says "oh, it's a hardware toggle"; toggle of what exactly?

A toggle that makes the chip treat all loads and stores from that thread as TSO.

so you're saying somehow Rosetta2 is looking at an x86 binary and figuring out exactly which portions of the program rely on the TSO ordering for correctness and then dynamically switches to weak ordering for parts that might be able to do without?

I don't really know much about the internals of macOS but figuring out when there are applications for example running on two different cores (since TSO is only really needed for multi-core use cases) that need to access the same memory and then applying TSO on the fly like that seems difficult. If that is what Rosetta2 is actually doing, that is impressive.

AFAIK: Apple Silicon features an MSR you can toggle which swaps the memory model for a core between ARM's relaxed model and x86's TSO model, all at once. When Rosetta2 launches an app, and translates it, it simply tells the kernel that the process, when given an active slice of CPU time, should use the TSO memory model, not the relaxed one. Only Rosetta2 can request this feature. That's about all there is to it, and it does this whether the app is multicore or not (yes TSO is only needed in multicore, but enabling it unilaterally is simpler and has no downsides for emulating single-core x86 apps.)

There's also a similar MSR for 4k vs 16k page sizes I think, another x86 vs Apple Silicon discrepancy, but I'm not sure if Rosetta2 uses that, too.

I think I understand now. Rosetta is just doing translation from x86 to ARM; however, native ARM doesn't have a notion of TSO which means they're still putting in the logic to maintain TSO just to assist with the better emulation performance. On a purely ARM machine I guess that logic wouldn't be needed.
No; it’s not explained very well but the M1 chip features hardware support for TSO which Rosetta2 uses.

It’s really ‘Apple Silicon’ and not just ARM.

> It's really 'Apple Silicon' and not just ARM.

Yeah, I think that's key to understanding this. They are supporting a version of ARM ISA running that maintains TSO even though official ARM doesn't need to support TSO. I guess this is all to get better emulation performance and avoid those extra synchronization instructions that would have to be added by Rosetta if the silicon did not have TSO support.