Hacker News new | ask | show | jobs
by bayindirh 692 days ago
It's not "zero copy networking" only.

In an IB network, two cards connect point to point over the switch and "beam" one's RAM contents to other. On top of it, with accelerated MPI, certain operations are offloaded to IB cards and IB switches (like broadcast, sum, etc.), so MPI library running on the host doesn't have to handle or worry about these operations, leaving time and processor cycles for computation itself.

This is the magic I'm talking about.

2 comments

IB didn't invent RDMA, and it's not even the only way to do it today.

it's also not amazingly great, since it only solves a small fraction of the cluster-communication problem. (that is, almost no program can rely on magic RDMA getting everything were it needs to be - there will always be at least some corresponding "heavyweight" messaging, since you still needs locks and other synchronization.)

I’ve used other peripherals that did this. Under the hood you would have a virtual mapping to a physical address and extent where the virtual mapping is in the address space of your process. This is how dma works in qnx because drivers are userspace processes. The special thing here is essentially doing the math in the same process as the driver.

I agree that sounds very nice for distributed computation.

> The special thing here is essentially doing the math in the same process as the driver.

No, you're doing MPI operations on the switch fabric and the IB ASIC itself. CPU doesn't touch these operations, but only see the result of the operation. NVIDIA's DPU is just a more general purpose version of this.