|
|
|
|
|
by beagle3
2899 days ago
|
|
Spark is significantly more efficient than Hadoop. I don’t know about your specific workload, but i’ve seen quite a few Hadoop setups that were at 100% load most of the time, and were replaced by relatively simple non Hadoop based code that used 2% to 10% of the hardware and ran about as fast. I didn’t spend much time evaluating the “pre”, but at least one workload spent 90% of the 100% on [de]serialization. It’s not my link, it is Frank McSherry who is commenting in this thread - I hope he can chime in on why he chose this specific example - but it correlates very well with my experience. |
|