Hacker News new | ask | show | jobs
by WestCoastJustin 2742 days ago
There might be a networking cap & disk I/O issues with the instances you picked on GCP vs AWS.

The GCP instance has 8 Gbps vs 10 Gbps for AWS. I don't really know without seeing the graphs from the instances, if you hit a cap, but this could make a difference in both transfer speeds and latency #'s for GCP. Also, for your local disk test, on GCP, disk size makes a difference to get the best performance. The larger the disk, the better the performance. PD disk read/write performance also comes out of the available network bandwidth! So, the instance you picked on GCP was at a disadvantage right from the start [3]. This likely explains the I/O Experiment graph and the "67x difference in throughput" as you're likely hitting caps, both in terms of network bandwidth, and disk performance compared to AWS. Seeing anything where it is x67 difference is a pretty big red flag that something strange is going on and needs further investigation.

GCP's n1-standard-16 = 8 Gbps max [1]

AWS's c5d.4xlarge = 10 Gbps max [2]

I guess the problem with comparing clouds, it is never apples vs apples, and I don't fault you for picking what do you (as it is not obvious). GCP typically gives you (core count / 2) = # Gbps network bandwidth. A good followup to your comparison might be to investigate why they #'s are different. Does adding more cpus, memory, network bandwidth increase performance?

[1] https://cloud.google.com/blog/products/gcp/5-steps-to-better... (see section #3).

[2] https://aws.amazon.com/blogs/aws/ec2-instance-update-c5-inst...

[3] https://cloud.google.com/compute/docs/disks/performance#size... (see the table re: disk size to bandwidth)

4 comments

> Also, for your local disk test, on GCP, disk size makes a difference to get the best performance. The larger the disk, the better the performance. Disk read/write performance also comes out of the available network bandwidth

Do you have a source for local SSD performance coming out of the available network bandwidth? According to GCP docs [1], this only applies to persistent disks. Local SSD perf depends only on disk size and choice of SCSI/NVMe interface.

According to another GCP doc [2], local SSDs are all 375 GB in size. For comparison, c5d.4xlarge has 400 GB, which is very close. So I don't see anything wrong in the benchmark unless they messed up and ran it against the persistent root disk instead of the local SSD.

[1] https://cloud.google.com/compute/docs/disks/performance#type...

[2] https://cloud.google.com/compute/docs/disks/#localssds

You are right. Sorry for the confusion. It is only PD (Persistent Disk) that comes out of network bandwidth. Anything on the NVMe SSD would be totally local to the machine (no network caps, etc). The article doesn't really say if they are using SSD PD or SSD NVMe for GCP. Also, disk size does matter for the NVMe SSD and performance (as you can stripe them together by adding more; up to 4). You can see the #'s by using the console and playing around with adding more NVMe SSDs (via this doc [1]).

  Size      Random IOPS                Throughput limit (MB/s)
  375GB     169,987 (r)  90,000 (w)      663 (r)   352 (w)
  750GB     339,975 (r) 180,000 (w)    1,327 (r)   705 (w)
  1125GB    509,962 (r) 270,000 (w)    1,991 (r) 1,057 (w)
  1500GB    679,950 (r) 360,000 (w)    2,650 (r) 1,400 (w)
[1] https://cloud.google.com/compute/docs/disks/local-ssd
I don't understand the throughput numbers given. 5.6GB/s for GCP and 9.6GB/s for AWS would be 44gbps and 76gbps respectively.

I don't don't know of any instances offering that kind of throughput.

I've personally validated GCP's statement that they offer 2gbps/core up to 16gbps. I can get 16gbps consistently between any two n1-standard-8 using iperf.

This generally makes network IO in GCP much cheaper.

I wouldn't read too much into it, it's clearly Gbps. The authors are just sloppy with capitalization. They also elsewhere talk about iperf having "128 kb" buffer which seems unlikely, and the throughput graph says "gb" where the text says "GB".

And then there's "iPerf" and "PING"...

If it were gbps then that's certainly weird. I consistently get way better network performance in GCP than in AWS.
GCP caps egress throughput to 2gbps per core up to 16gbps:

https://cloud.google.com/vpc/docs/advanced-vpc#measurenetwor...

Are you saying gigabytes or gigabits?

GCP is 2 gigabits/second per core up to a max of 16 gigabits/second for a single VM. Persistent disks other than local SSDs also eat into this network traffic as well.