Hacker News new | ask | show | jobs
by rev_d 3380 days ago
Wouldn't it make a lot of sense to just use Pyspark with RDDs? Latency would be relatively high, but it'd also bypass the GIL while also being more modern.
1 comments

In my experience pyspark is much more flaky and annoying that doing parallel computing with more 'python native' tools. It only really makes sense when you outgrown small clusters and really need huge infrastructure.
What python tools do you use for small clusters?
Dask would be an option.
Was going to say that. Or ipython parallel if you want to go lower level