|
|
|
|
|
by robryk
3542 days ago
|
|
If you're looking for weird synchronization primitives, look at the documentation of the DMA controller. It has a mode in which it stores bytes that are written to a particular address in a memory range in order the writes arrive. I haven't figured out a reasonable way to use that with multiple writers (except the trivial case of having a byte-based stream with bounded size), though. |
|
The problem becomes a lot easier if you can reduce the multiple-writer case to the single-writer case. One idea that occurred to me is that since you have 1024 cores, it might make sense to dedicate a small fraction of them (say, 1/64) to synchronization. When you need to send a message to another process, you write to a nearby "router" that has a dedicated buffer to receive your data. The router can then serialize the with respect to other messages and put it into the receiver's buffer.
Basically, you'd end up defining an "overlay network" on top of the native hardware support; you pay a latency cost, but you gain a lot of flexibility.
EDIT: I may be completely wrong about the first paragraph; it looks like the TESTSET instruction might actually be usable on remote addresses. I assumed it didn't because the architecture documentation doesn't say anything about how such a capability would be implemented. But if it works, it would drastically simplify inter-node communication.