Hacker News new | ask | show | jobs
by gpderetta 1556 days ago
First of all, it is amazing that the author managed to analyze the patch in so much details, it probably is an effort comparable to the bug fix itself. Still I think the article is missing some bits. I would expect any core migration to require barriers (either implicit or explicit) on both the old and new core otherwise the process would risk seeing its own stores and loads out of order.

But in this case the barrier is predicated on the execution of some cache manipulation instruction, so I suspect things are more complicated. Maybe these specific cache manipulation instructions do not respect the usual architectural memory ordering and require some different set of barriers. Possibly they bypass cache coherence completely and require an actual flush of the cache. That is going to be very expensive and it make sense that it is only done only if the process was actually fiddling with these instructions. 'jmgao' else thread reported that tegra has coherency issues on migration, so it might be related.

2 comments

> But in this case the barrier is predicated on the execution of some cache manipulation instruction, so I suspect things are more complicated.

Why do you think so? The explanation given seem reasonable to me…

As I sad, I would expect the barriers to be needed unconditionally on a core migration. The fact that there is a special flag that is set when (and only when) the cache control instructions are used seem to point to some special handling specifically for those instructions.

Edit: having read the page for the nth time, I think I finally understand your point. The code using the cache instructions had an explicit barrier already, but it would be executed on the wrong thread.

I know nothing about the arm memory model, but likely the dsb sy barrier is a stronger barrier than needed for intercore communication, and it is needed for IO serialisation, for example with an mapped PCI device.

So yes, the article is clear and likely correct, I just failed to understand it fully originally.

Very classy!
After rereading the article, the "missing bit", which is actually tangentially touched in the article is that, the barrier is not needed to synchronize between the two core, but to synchronize with other hardware, for example the GPU (hence the note about graphic glitches). So the context switching code need to issue the barrier from the correct core. The Linux kernel for example always issue the additional I/O barrier on core migration.