If you're waiting for IO, you're likely getting booted off the processor by the OS anyway. SMT is most useful when your code doesn't have enough instruction-level parallelism but is still mostly compute bound.
Infiniband uses RDMA, which is different than ordinary DMA. Your IB card sends the data to the client point to point, and the IB card directly writes it to the RAM. IB driver notifies that the data is arrived (generally via IB accelerated MPI), and you directly LOAD your data from the memory location [0].
IOW, your data magically appears in your application's memory, at the correct place. This is what makes Mellanox special, and made NVIDIA to acquire them.
From the linked document:
Instead of sending the packet for processing to the kernel and copying it into the memory of the user application, the host adapter directly places the packet contents in the application buffer.
In an IB network, two cards connect point to point over the switch and "beam" one's RAM contents to other. On top of it, with accelerated MPI, certain operations are offloaded to IB cards and IB switches (like broadcast, sum, etc.), so MPI library running on the host doesn't have to handle or worry about these operations, leaving time and processor cycles for computation itself.