|
|
|
|
|
by Barraketh
2429 days ago
|
|
I know that Spark has had a lot of work put into it, but my personal experience with it has been pretty negative. I've spent a lot of time at my job trying to tune it to our workflows (extremely deep queries), with only moderate success. I've just POC'd a custom SQL execution engine that was 200x faster than spark for the same workflows. Now, our requirements are pretty non-standard, but I find it pretty easy to believe these benchmarks. |
|
The best rule of thumb I'm aware of is: unless you can't fit your computation on a single machine or your jobs are likely to fail before completing from the size and length involved, you are generally better off without Spark or similar systems. And if sampling can get you back onto a single machine, then you're really better off.