Hacker News new | ask | show | jobs
by rr808 398 days ago
> a) You can easily run Spark jobs on a single box. Just set executors = 1.

Sure but why would you do this? Just using pandas or duckdb or even bash scripts makes your life is much easier than having to deal with Spark.

1 comments

For when you need more executors without rewriting your logic.
Using a Python solution like Dask might actually be better, because you can work with all of the Python data frameworks and tools, but you can also easily scale it if you need it without having to step into the Spark world.
But Dask is orders of magnitude slower to Spark.

And you can still use Python data frameworks with Spark so not sure what you're getting.