Hacker News new | ask | show | jobs
by rcxdude 692 days ago
If you're waiting for IO, you're likely getting booted off the processor by the OS anyway. SMT is most useful when your code doesn't have enough instruction-level parallelism but is still mostly compute bound.
1 comments

I believe "I/O" here is referring to data movement between DRAM and registers. Not drives or NICs.
Yes, exactly. One exception can be Infiniband, since it can put the received data to RAM directly, without CPU intervention.
DMA is a much older technology. It's just that at some point you do need the CPU to actually look at it.
Infiniband uses RDMA, which is different than ordinary DMA. Your IB card sends the data to the client point to point, and the IB card directly writes it to the RAM. IB driver notifies that the data is arrived (generally via IB accelerated MPI), and you directly LOAD your data from the memory location [0].

IOW, your data magically appears in your application's memory, at the correct place. This is what makes Mellanox special, and made NVIDIA to acquire them.

From the linked document:

Instead of sending the packet for processing to the kernel and copying it into the memory of the user application, the host adapter directly places the packet contents in the application buffer.

[0]: https://docs.redhat.com/en/documentation/red_hat_enterprise_...

Linux has had zero copy network support for 15 years. No magic.
It's not "zero copy networking" only.

In an IB network, two cards connect point to point over the switch and "beam" one's RAM contents to other. On top of it, with accelerated MPI, certain operations are offloaded to IB cards and IB switches (like broadcast, sum, etc.), so MPI library running on the host doesn't have to handle or worry about these operations, leaving time and processor cycles for computation itself.

This is the magic I'm talking about.