|
|
|
|
|
by Dylan16807
1556 days ago
|
|
I don't see anything about instruction reordering? Where'd you get that from? And I don't really see how the interaction with interrupt entry could be the problem, since the code works just fine if the code in the interrupt leaves the thread on the same core. > ARM caches are (normally) coherent, though. Don't you need a memory barrier to get that coherency, though? |
|
And the reason those memory operations might be seen in an order different from their appearance in the machine code is precisely the fact that the processor executes them in parallel and potentially out of order. On x86, the hardware does magic (in almost all cases) to prevent this artifact. But ARM puts the responsibility on the programmer.
But all that stuff is specified (even if it's hard to reason about). What's happening here is extra-specification, something about that cache invalidate and barrier interacts in a way that an interrupt can mess up. But we don't know what it is, because it seems like ARM didn't tell anyone.
Basically: as I see it, any OS author writing interrupt entry code on ARM64 (I work on Zephyr, though not on the ARM port) needs to put a barrier instruction on the entry path for safety, because at least some hardware misbehaves without it. But that said, almost all real OSes are going to have one anyway for locking purposes (i.e. you have to take a spinlock to interact with OS state somewhere, and htat requires a barrier on SMP ARM systems). It's likely that this Nintendo sequence is part of some kind of micro-optimized thing and not a general purpose ISR.