Hacker News new | ask | show | jobs
by jedberg 3821 days ago
AWS has a limit on the total throughput any one account can have to S3, so the more CPUs OP adds, the worse OPs performance will be on each one. I suspect the other providers have the same restriction.

I either missed it or OP didn't specify how many instances they was using at once to run their benchmark, but the more instances they used, the worse it will be per node.

This did not seem to be accounted for.

EDIT: OP says below it was from one instance, so what I said doesn't apply to this writeup.

4 comments

This is not the case with Google Cloud Storage. I cannot speak to the other providers.

Google Cloud Storage does not limit read or write throughput with the exception of our "Nearline" product (and even Nearline's limiting can be suspended for additional cost, a feature called "On-Demand I/O").

That's good to know, and definitely adds credence to my opinion that networking is the area where Google is definitely winning the Cloud Wars(tm)
All the benchmarks were from a single instance.

(Note that I have done some testing from AWS Lambda, where we had 1k lambda jobs all pulling down files from S3 at once. That's a bit harder to benchmark...)

Hi OP, nice writeup! I hope my comment wasn't construed as dismissing the work, just a criticism of one small part.

It sounds like that wouldn't have been a factor, except for the cap you seem to have discovered on Amazon that you called out.

My only suggestion then is you may want to make it explicit that you ran the benchmarks from a single instance.

Thanks! Not at all, it's a great point and something I didn't realize would play into the equation.
Any comments on how it worked out with Lambda?
Reluctant to say much because the benchmarks weren't formal. However...

The throughput correlated directly with how much RAM we allocated to the Lambda function (which presumably means we were sharing the VM with fewer other jobs).

512 MB RAM, 19.5 MB/s

768 MB RAM, 29.8 MB/s

1024 MB RAM, 38.4 MB/s

1536 MB RAM, 43.7 MB/s

Note that this also used the node.js AWS SDK, which is slower to download files than some other APIs.

Thanks. I'd guess bigger RAM uses bigger instance types as a host hence more bandwidth. If this was my goal I'd try gof3r to stream data from s3.
Do you have any sources or more information about the per-account S3 limits?
I don't have any published sources, it's something they told me, but it's hinted at here: http://docs.aws.amazon.com/AmazonS3/latest/dev/request-rate-...

They explicitly mention the RPS per account limit in that doc, which is related.

RPS to S3 is limited, but not throughput to S3, except by bucket. Higher throughput can be achieved by sharding your data across multiple buckets. Also, its important to properly namespace your keys within buckets to ensure its efficiently distributed across underlying data partitions.
Unless that is a semi-recent change, that is not what I've been explicitly told. To be fair my information is at least two years old now.
My experience is solely based on recent production workloads attempting to pull TBs of data out of S3 very quickly to restore data to less than reliable indexed datastore. YMMV.
Can you quote the piece where they mention RPS per account limit because I cannot find it.
> However, if you expect a rapid increase in the request rate for a bucket to more than 300 PUT/LIST/DELETE requests per second or more than 800 GET requests per second, we recommend that you open a support case to prepare for the workload and avoid any temporary limits on your request rate.

You have to know how to read their docs. :) This is basically code for, "there is a default limit here that you have to get raised if you want to go above it".

The full quote is:

>Amazon S3 scales to support very high request rates. If your request rate grows steadily, Amazon S3 automatically partitions your buckets as needed to support higher request rates. However, if you expect a rapid increase in the request rate for a bucket to more than 300 PUT/LIST/DELETE requests per second or more than 800 GET requests per second, we recommend that you open a support case to prepare for the workload and avoid any temporary limits on your request rate. To open a support case, go to Contact Us.

So this looks like an auto scaling issue. It states "S3 automatically scales to support higher request rates". However, if we know that a bucket is going to need to scale dramatically, we can request, in advance, that the S3 team pre-scales it.

I'm sure there is an account limit, but to run 1000 cpu's already requires requesting an increase in the account's EC2 instance limit. Are you saying that a team trying to access 150Gb of files, or to make 1000 RPS, as the article documents, will hit that limit? From your experience, how big is this hard limit? Is it Netflix scale or is it GB or TB?

We are routinely pulling a dataset of hundreds of GBs to 100+ instances (1600+ cores) in parallel. We have never noticed throughput going down with the number of nodes. S3 delivers the maximum throughput of 2-4Gbps / instance very consistently.
Take into account OP's former jobs. I imagine if anyone would run into such a limit, it would be Reddit or Netflix.
If such a limit exists, it would not have been hit on such a small benchmark. However, I am unaware of any such limit and it has never been raised in any discussion I have had with them. I am responsible for a large compute and data storage platform backed by S3.

Is this a limit that is hit anywhere near the 150GB discussed in this article, or is it something that you hit only if you are Netflix? We have TB in S3 and have not observed any limit other than EC2 instance bandwidth.

The amount of data one has in S3 isn't really relevant to the discussion, only how quickly you're trying to pull it into your instances.
Ok then let me rephrase: Is this a limit that is hit anywhere near the 603GB/s figure in this article, or is it something that you hit only if you are Netflix? You seem to be claiming that such a limit exists and that you know what it is. Can you share or is this NDA territory?