|
|
|
|
|
by madhadron
2429 days ago
|
|
McSherry et al's paper "Scalability! But at what COST?" is worth reading. A single threaded, single core implementation typically outperforms Spark. The best rule of thumb I'm aware of is: unless you can't fit your computation on a single machine or your jobs are likely to fail before completing from the size and length involved, you are generally better off without Spark or similar systems. And if sampling can get you back onto a single machine, then you're really better off. |
|