|
|
|
|
|
by threeseed
390 days ago
|
|
a) You can easily run Spark jobs on a single box. Just set executors = 1. b) The reason centralised clusters exist is because you can't have dozens/hundreds of data engineers/scientists all copying company data onto their laptop, causing support headaches because they can't install X library and making productionising impossible. There are bigger concerns than your personal productivity. |
|
Sure but why would you do this? Just using pandas or duckdb or even bash scripts makes your life is much easier than having to deal with Spark.