Hacker News new | ask | show | jobs
by Tuna-Fish 2401 days ago
On x86, the correct model is that on any core, all reads are in order, all writes are in order, and no write will ever be moved earlier than a read on that core.

Or in other words, the only kind of visible reordering that is allowed to occur is that writes can be delayed past reads.

An example of a situation where this is significant:

    thread 1       thread 2
    mov [X], 0     mov [Y], 0
    mov [X], 1     mov [Y], 1
    mov r1, [Y]    mov r2, [X]
    
After this sequence of code, r1 == r2 == 0 is legal. (As is any other combination of 1 and 0.)

(edit:) And just to add, all this reordering is of course impossible to detect on just one core, as when a read request hits a recent write on the same core, it reads it out of the store queue. This can sometimes be really bad for performance, though, as if you read a value that is partially in the store queue (such as, write 16-bit value to x, the immediately read 32 bits from x), some cpus will stall that read, and all that follow it, until the entire store queue is flushed. Since the store queue can easily take tens if not hundreds of cycles to clear, this can be very expensive.