Hacker News new | ask | show | jobs
by ajross 1556 days ago
Memory barrier instructions on ARM exist to force ordering of memory operations as seen by external hardware (generally software on other cores). They obviously interact with the cache at a hardware level, but they're different layers of abstraction.

And the reason those memory operations might be seen in an order different from their appearance in the machine code is precisely the fact that the processor executes them in parallel and potentially out of order. On x86, the hardware does magic (in almost all cases) to prevent this artifact. But ARM puts the responsibility on the programmer.

But all that stuff is specified (even if it's hard to reason about). What's happening here is extra-specification, something about that cache invalidate and barrier interacts in a way that an interrupt can mess up. But we don't know what it is, because it seems like ARM didn't tell anyone.

Basically: as I see it, any OS author writing interrupt entry code on ARM64 (I work on Zephyr, though not on the ARM port) needs to put a barrier instruction on the entry path for safety, because at least some hardware misbehaves without it. But that said, almost all real OSes are going to have one anyway for locking purposes (i.e. you have to take a spinlock to interact with OS state somewhere, and htat requires a barrier on SMP ARM systems). It's likely that this Nintendo sequence is part of some kind of micro-optimized thing and not a general purpose ISR.

1 comments

On what basis are you saying the interrupt messes it up?

The post directly says that if a migration doesn't happen, then nothing goes wrong.

What messes up is when you do the barrier-needing instructions on one core, and a memory barrier on a completely different core. Which seems pretty expected to me.

If the thing you're describing happens, that does sound like a hardware bug, but I don't see where you got that description from.

> On what basis are you saying the interrupt messes it up?

Because nothing else makes sense. The code as posted in the linked article does not seem to have an ordering violation that I can see. The linked blog just asserts that it's there, but AFAICT it isn't unless there's a symmetric ordering bug in the putative context switch code that isn't presented.

The problem is very simple, isn't it? There are instructions that need a memory barrier after them. If the thread leaves the core, then from the view of that core no memory barrier happens.

And it's a particular kind of memory barrier that nothing else does incidentally.