Hacker News new | ask | show | jobs
by iskander 4409 days ago
So the central piece of data is something like a 10 million element RDD of (UserId, (MovieId, Rating))? If so, it sounds like that data would fit into a single in-memory sparse array, how does Spark's performance compare with a local implementation?

By comparison, I'm trying (and failing) to work with RDDs of 100+ billion elements.

1 comments

What is the difference between Spark and Storm? They both seem like "realtime compute engines"

*edit - from what I can see Spark is a replacement for hadoop (offline jobs), where Storm deals with online stream processing

Storm is generally more of a dataflow "per event" real/near time computation system (with each event flowing through N spouts and bolts) whereas Spark is more of an in-memory data processing system (with Spark streaming being the "equivalent" to the storm system).