After rereading the article, the "missing bit", which is actually tangentially touched in the article is that, the barrier is not needed to synchronize between the two core, but to synchronize with other hardware, for example the GPU (hence the note about graphic glitches). So the context switching code need to issue the barrier from the correct core. The Linux kernel for example always issue the additional I/O barrier on core migration.