| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hobbyist 4788 days ago
	Good question. I did read the spark paper, and one reason that I found for spark doing so much better than hadoop was that it avoids the unnecessary serialization, deserialization which hadoop just can not avoid. The RDD's as mentioned by @rxin, are in memory objects and thus do not require frequent serialization/deserialization when multiple operations are being applied to data.