Hacker News new | ask | show | jobs
by sknat 1975 days ago
I'm a bit dubious about the networking results they present. I did some quite extensive network performence testing last winter on those three CSP, and even if single queue TCP+gso performence can behave like this, I find the claim 'GCP is 3x faster than AWS' a bit bold. It's definitely possible to get 50G of TCP traffic in AWS, and a lot of things are in the balance (MTU, number of queues, drivers...) that make this claim a bit weird to me.
1 comments

One of the engineers who helped run benchmarks and compile the report here. It’s worth noting that for the majority of the machines we benchmarked on AWS, their tested bandwidth met the published AWS expectations. You may have noticed that some of the “network optimized” machines fell short of the published expectations though, and there’s an explanation in the report about how we tried to validate our findings.

As you point out, there are a variety of variables that could be tuned to eek out better performance here, and they could bring the two clouds closer. Our claim, of course, only applies to the benchmark configuration we tested with. That being said, with the size of machine we were restricting our testing to (16 vCPUS), no AWS machine claimed to offer more than 25G of throughput.

All the 16 vCPU "n" instances (m5n, c5n, r5n, etc) are capable of hitting the 25 Gbps limit easily. In your report, all of the AWS results are limited to either 5Gbps or 10Gbps, but this is because of a very specific test condition.

From my understanding of the test scenario, you are using a single TCP connection to run the throughput test, and hitting AWS' documented[1] throughput limit for a single flow: 10 Gbps if the two instances are in the same placement group and 5 Gbps otherwise. The reason some of the network-optimized instances were "slower" than the non-optimized ones is most likely a random draw of whether both instances in the test were physically close to each other (basically whether or not they are accidentally in placement groups).

To show the true throughput you would need to use multiple connections/flows, 5-10 would probably suffice. If the single flow test case was important then maybe you should have mentioned that AWS has a specific limitation around this. Personally I don't think a single flow test case is particularly realistic for a throughput test. Either way, how it is presented is pretty misleading.

1. "Single TCP flow is limited to 10 Gbps for instances in the same placement group and 5 Gbps between instances anywhere else." https://docs.aws.amazon.com/whitepapers/latest/ec2-networkin...

Right, this also aligns with the our testing. Number of queues available seem to also be playing a role. But you're right RSS should spread traffic quite nicely when using 5-10 flows.
Do you know why they place this limit?