| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by aksbhat 5528 days ago

I have tried using both Hadoop (55 node cluster at Cornell) and a single AWS High Memory double extra large instance with 32GB memory.

I have found that since the Twitters social graph is small enough to fit in the memory, a single instance with huge amount of RAM is much more efficient, especially when your algorithm iterates over nodes in the network.

You can read about it here:

Hadoop based results: www.akshaybhat.com/LPMR/

Results using a single High Memory instance AWS instance www.akshaybhat.com/LPMR/GRAPHLAB

Even the startup hunch has taken a similar approach and use a single machine with large amount of memory rather than a hadoop cluster.

1 comments

mcroydon 5528 days ago

Good call. The dataset was large enough that it didn't feel silly to use something like MapReduce but the same thought has been in the back of my head the whole time.

link