And yet an ethernet frame, by design, is larger than an infiniband frame (think layer 2). When it comes down to node to node latency, given perfectly equal silicon, infiniband will still be faster.
I think the minimum size of an IB packet with no payload is 26 octets, vs. 64 octets for an eth packet. So sure, a difference of 38 octets, but at, say, 100 Gbit/s, that's less than a nanosecond difference, much much less than the IB vs. ethernet latency difference. So I think you'll have to look somewhere else for information.
I have no idea what it is, actually. Some ideas that may or may not matter (or might not even be correct):
- IB is a couple of decades younger, so could benefit from knowledge how to do fast protocols. (Not an explanation per se)
- Simpler forwarding. In IB the subnet manager gives out the LID's that are used for routing withing a subnet. They are shorter than an eth MAC (16 vs. 48 bits), so the lookups circuit in the switches can be smaller and faster(?), and also since the LID's are assigned by the subnet manager rather than being burned at the factory, they can be distributed taking into account the subnet topology, allowing switches to use LID Mask Count (LMC) filtering. Similarly, all routes within a subnet are calculated statically a priori by the subnet manager (load balancing among multiple paths is only static round robin, not dynamical load dependent), and don't have to be calculated on the fly by the switches.
- FEC rather than retransmission in case of corruption.
Sure, IB is simply is a superior fabric for its niche.
For everything else, RDMA on Ethernet buys you with ability to reuse your L2, and this matters way way more to people running DC businesses than anything else.
I have no idea what it is, actually. Some ideas that may or may not matter (or might not even be correct):
- IB is a couple of decades younger, so could benefit from knowledge how to do fast protocols. (Not an explanation per se)
- Simpler forwarding. In IB the subnet manager gives out the LID's that are used for routing withing a subnet. They are shorter than an eth MAC (16 vs. 48 bits), so the lookups circuit in the switches can be smaller and faster(?), and also since the LID's are assigned by the subnet manager rather than being burned at the factory, they can be distributed taking into account the subnet topology, allowing switches to use LID Mask Count (LMC) filtering. Similarly, all routes within a subnet are calculated statically a priori by the subnet manager (load balancing among multiple paths is only static round robin, not dynamical load dependent), and don't have to be calculated on the fly by the switches.
- FEC rather than retransmission in case of corruption.