|
|
|
|
|
by hobbyist
4741 days ago
|
|
Good question. I did read the spark paper, and one reason that I found for spark doing so much better than hadoop was that it avoids the unnecessary serialization, deserialization which hadoop just can not avoid. The RDD's as mentioned by @rxin, are in memory objects and thus do not require frequent serialization/deserialization when multiple operations are being applied to data. |
|