|
|
|
|
|
by missosoup
2234 days ago
|
|
I've been in the industry for 10+ years. I've worked with everything from telco metrics firehoses to bank customer event streams to deep learning. The venn intersection of conditions where spark makes sense is really rather narrow. A single high spec instance running leaner tooling will generally meet one's requirements while blowing spark out of the water in terms of perf and cost. Operationally, spark is a huge PITA, hence databricks and a host of other offerings, I guess including this one, to try to manage the pain. Meanwhile something like dask-kubernetes will cater to the same use case with significantly lower operational complexity and again much higher perf and cost efficiency. I can't really think of a scenario where I'd choose to use spark on a greenfield project today. |
|