Hacker News new | ask | show | jobs
by iskander 4400 days ago
Well, the RDD is initially partitioned using a RangePartitioner over a dense key space of Longs. Each element is then expanded ~100x (each object is significantly smaller than the original value). So the total memory footprint and skew of the expanded RDD shouldn't, theoretically, be a problem.