|
|
|
|
|
by iskander
4409 days ago
|
|
So the central piece of data is something like a 10 million element RDD of (UserId, (MovieId, Rating))? If so, it sounds like that data would fit into a single in-memory sparse array, how does Spark's performance compare with a local implementation? By comparison, I'm trying (and failing) to work with RDDs of 100+ billion elements. |
|
*edit - from what I can see Spark is a replacement for hadoop (offline jobs), where Storm deals with online stream processing