Back around 2012 I worked with a guy, a FreeBSD kernel committer, who insisted volatile was sufficient as a thread synchronization primitive. He convinced our boss.
Volatile doesn’t even guarantee that data is written atomically in one step and not e.g. byte-wise. Also it allows both the compiler as well as the CPU to reorder it with any read or write. I can’t think of anything that would it could be used for in a multithreaded environment.
There is really only a single place volatile actually works, and that is for memory-mapped hardware registers. Anybody who says it is useful for anything else is badly mistaken.
Except in MSVC, where it kinda/sorta means atomic.
It allows the compiler to reorder it with any non-volatile read or write. Of course it still doesn't indicate anything to the CPU. In single-core embedded systems where the CPU doesn't reorder anything and you know the compiler is going to emit a single instruction for a read or write it can be sufficient (for example, this is how FreeRTOS implements all of its threading primitives)
This definitely wasn't one. It didn't bite in any obvious way because Intel, and because the system already had so many other bugs. (Free advice: don't take on a C++ program written by a Java coder.)
Although, any particular thing happening to be one of those is a pretty rare event, so odds are good that this wasn't one.