| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jonburs 5399 days ago

Amazon -- you can use Elastic Map Reduce [1] (the easiest way to run a transient Hadoop cluster) or spin up a bunch of EC2 nodes and deploy your own stack. Both methods can provide you access to very powerful hardware [2].

(Disclaimer: I work for Amazon, but not in AWS, and these are my own personal opinions.)

[1] http://aws.amazon.com/elasticmapreduce/

[2] http://aws.amazon.com/ec2/instance-types/

1 comments

epistasis 5399 days ago

I don't work for Amazon, have hundreds of non-AWS CPUs at my disposal, and still prefer AWS, as long as you don't have to do much I/O.

Once there's a few terabytes of data or more, it's best to be off of AWS, because you'll never be able to saturate your CPUs. Even for smaller datasets of 10s of gigabytes, it's kind of iffy.

link