Hacker News new | ask | show | jobs
by TimWolla 1221 days ago
71M requests per second gives you about 14ns per request. WolframAlpha says that's about 4 times the latency of a L2 cache access and 0.8 times the latency of a Mutex lock / unlock: https://www.wolframalpha.com/input?i=1+second+%2F+71+million. So even if you have a massive amount of cores in that machine, you still don't really have much time to spend on a single request and a single mutex operation will exceed your budget.

For comparison: HAProxy was able to deal with 2M requests per second on a single machine in 2021: https://www.haproxy.com/de/blog/haproxy-forwards-over-2-mill... (Disclosure: I'm a HAProxy community contributor).

2 comments

> So even if you have a massive amount of cores in that machine, you still don't really have much time to spend on a single request and a single mutex operation will exceed your budget.

fortunately it's no longer 2000 and I have more than one core, and my NIC has more than one queue

generating requests is a lot less CPU intensive than parsing requests

how much CPU do I have to spend to get a pre-formed 100 byte request into the NICs queue? not much at all

(the TCP negotiation will likely be the bottleneck)

Could you do this and write it up? I'd love to see how you tune this stuff in hardware and software to get that sort of throughput.
These were HTTP/2 requests, not connections, so no per-request TCP negotiation. I'd say it's very easy to generate a huge number of requests like this, even if it's over TLS.
I've interpreted the "put out" in the initial comment as in "put out a fire" (i.e. mitigate), instead of "send out".
But it's not one machine if you do anycast.

It's a lot, but at cloudflare's scale you either have the budget for a crapton of machines per point of presence (hope I'm using the term correctly) or custom hardware that can deal with this sort of thing. It's kind of their core business.