| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by iskander 4409 days ago
	So the central piece of data is something like a 10 million element RDD of (UserId, (MovieId, Rating))? If so, it sounds like that data would fit into a single in-memory sparse array, how does Spark's performance compare with a local implementation? By comparison, I'm trying (and failing) to work with RDDs of 100+ billion elements.

1 comments

platz 4409 days ago

What is the difference between Spark and Storm? They both seem like "realtime compute engines"

*edit - from what I can see Spark is a replacement for hadoop (offline jobs), where Storm deals with online stream processing

link

ironchef 4409 days ago

Storm is generally more of a dataflow "per event" real/near time computation system (with each event flowing through N spouts and bolts) whereas Spark is more of an in-memory data processing system (with Spark streaming being the "equivalent" to the storm system).

link