Edit: here is a good (but long) article with a x86 Peterson Lock example (with an analysis of the critical race without the barrier):
https://bartoszmilewski.com/2008/11/05/who-ordered-memory-fe...