Hacker News new | ask | show | jobs
by hobbyist 4741 days ago
Good question. I did read the spark paper, and one reason that I found for spark doing so much better than hadoop was that it avoids the unnecessary serialization, deserialization which hadoop just can not avoid. The RDD's as mentioned by @rxin, are in memory objects and thus do not require frequent serialization/deserialization when multiple operations are being applied to data.