| The answer to that question is not quite as straight-forward as you might think. In many ways, this experiment/post is about figuring out the answer to the question of "what is the best the hardware can do". I originally started running these tests using the c5.xlarge (not c5n.xlarge) instance type, which is capable of a maximum 1M packets per second. That is an artificial limit set by AWS at the network hardware level. Now mind you, it is not an arbitrary limit, I am sure they used several factors to decide what limits make the most sense based on the instance size, customer use cases, and overall network health. If I had to hazard a guess I would say that 99% of AWS customers don't even begin to approach that limit, and those that do are probably doing high speed routing and/or using UDP. Virtually no-one would have been hitting 1M req/s with 4 vCPUs doing synchronous HTTP request/response over TCP. Those that did would have been using a kernel bypass solution like DPDK. So this blog post is actually about trying to find "the limit", which is in quotes because it is qualified with multiple conditions: (1) TCP (2) request/response (3) Standard kernel TCP/IP stack. While working on the post, I actively tried to find a network performance testing tool that would let me determine the upper limit for this TCP request/response use case. I looked at netperf, sockperf and uperf (iPerf doesn't do req/resp). For the TCP request/response case they were *all slower* than wrk+libreactor. So it was up to me to find the limit. When I realized that I might hit the 1M req/s limit I switched to the c5n.xlarge whose hardware limit is 1.8M pps. Again, this is just a limit set by AWS. Future tests using a Graviton2 instance + io_uring + recompiling the kernel using profile-guided optimizations might allow us to push past the 1.8M pps limit. Future instances from AWS may just raise the pps limit again... Either way, it should be fun to find out. |