Hacker News new | ask | show | jobs
by legerdemain 1245 days ago
In a twist, pandas programs don't get parallelized on Spark. Someone had to go and write a parallel layer that duplicated the pandas API, because otherwise you ended up with the entire pandas program executing on a single executor.
1 comments

there is Pandas on Spark, included into Spark itself (originally Koalas) - the switch to it is very easy, and you get parallelization.