|
|
|
|
|
by gpderetta
767 days ago
|
|
Note that writing 64 bits and reading 32 (or viceversa) is not a way to get around fences on x86. It is explicitly documented as begin undefined. In most cases it will fail to store-forward that will stall and act as an implicit fence, but in some cases the CPU can do partial store forwarding, breaking it. AFAIK this trick does work on SPARC though. |
|
Intel's latest uarch does partial store forwarding.
There was a paper from a few years ago trying to define semantics for mixed-size accesses (not for x86 though) https://www.cl.cam.ac.uk/~pes20/popl17/mixed-size.pdf
I don't think the parent was talking about this, though; they were just talking about using a single large physical location, which logically contains multiple smaller values. Accesses to a single location happen order, so there is indeed no need for fencing between accesses to it. Usually you get a full 128 bits (at least amd64/aarch64/ppc64; not riscv yet but I expect they will get there).
That said—mixed-size can be useful despite the lack of semantics (I think linux uses them in a few places?). sooo