Hacker News new | ask | show | jobs
by user5994461 3480 days ago
S3 is ideal for multi TB working set.

That should be the de-factor standard for TB scale. In fact, don't bother comparing other products if you're TB scale, just use S3.

1 comments

Really?

Say you're going to ETL or Map/Reduce over all that data a lot of times, you're telling me that reading it all for processing over S3's rest api (which is the only method?) instead of, say, a local array of 15k sas's over pcie hba's is ideal?

It's pretty expensive and inefficient to my eyes, what am I missing? I

In what way would S3 be better than running this on your own gear if cost and perf are clearly not going to be better (which are really the big factors in this decision)?

You're missing that S3 is the storage system for RedShift and EMR (emr = managed hadoop on AWS).

They are pretty cheap, efficient and simple to use ;)