Hacker News new | ask | show | jobs
by nialv7 208 days ago
I think that's right, there is no better way than just adding barriers. On Apple hardware it can probably make use of the special memory ordering mode, but on normal ARM64 there's probably nothing it can do.
1 comments

There’s one trick: run those threads on one cpu. But that may be slower than barriers on multiple CPU’s, unless the code uses a lot of library code that can be emulated directly, separately on other cpus.