|
|
|
|
|
by Veserv
238 days ago
|
|
I am aware of how network protocol stacks work. Getting 200 Gb/s of reliable in-order bytestream per core over a unreliable, out-of-order packet-switched network using standard ethernet is not very hard with proper protocol design. If memory copying is not your bottleneck (ignoring encryption), then your protocol is bad. Hardware crypto acceleration and a hardware memory copy engine do not constitute a RDMA engine. The API I am describing is the receiver programming into a device a (address, length) chunk of data to decrypt and a (src, dst, length) chunk of data to move, respectively. That is a far cry from a whole hardware network protocol. |
|
You also suggested that this can be done using a single CPU core. It seems to me that this proposal involves custom APIs (not sockets), and even if viable with a single core in the common case, would blow up in case of loss/recovery/retransmission events. Falcon provides a mostly lossless fabric with loss/retransmits/recovery taken care of by the fabric: the host CPU never handles any of these tail cases.
Ultimately there are two APIs for networks: sockets and verbs. Former is great for simplicity, compatibility, and portability, and the latter is the standard for when you are willing to break compatibility for performance.