Hacker News new | ask | show | jobs
by startupsfail 483 days ago
It’s not. When you are a high frequency trader and you’ve mastered RDMA, everything around you looks slow. You are thinking in terms of 20 nanoseconds intervals, while everyone around still thinks that serving a query under a millisecond is fast.
1 comments

Huh? What kind of RDMA has a completion latency of 20 nanoseconds? It's more like 5 microseconds.

I agree that a lot of "modern" storage stack is way too slow though, tried to find a replication-first object storage for crazy-fast random read in small number of objects last year and found none.

Completion latency is one thing, bandwidth would be another. There's apparently a whole world of Alveo SmartNIC's and related FPGA platforms, and it can totally get in nanosecond range for whatever nails that may fit the compute-in-network hammer, even if bound by latency of the consuming system / RDMA interface. Also: https://github.com/corundum/corundum is really popular with the Chinese!
I was talking about, thinking in terms of 20 nanoseconds intervals, rather than completing a request in 20 nanoseconds. To get 1 microsecond wire-to-wire latency you do need to count your nanoseconds.

Why this number - this is because it’s roughly the time it takes to read 64 bytes from L3 cache. And NICs tend to be able to push data into L3 (or equivalents).

Current state of the art - look up nanoPU, from Stanford. Wire-to-wire under 100ns is not impossible, but this would normally assume pre-cooked packet, selected from a number of packets (which is not an unusual scenario in HFT).

Ah, makes sense. Sadly RDMA isn't that fast for now, or at least commercial RNICs/switches don't :( Once you left your host in data center network, everything counts in microseconds.