| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rev_d 3380 days ago
	Wouldn't it make a lot of sense to just use Pyspark with RDDs? Latency would be relatively high, but it'd also bypass the GIL while also being more modern.

1 comments

mangecoeur 3380 days ago

In my experience pyspark is much more flaky and annoying that doing parallel computing with more 'python native' tools. It only really makes sense when you outgrown small clusters and really need huge infrastructure.

link

splike 3380 days ago

What python tools do you use for small clusters?

link

elyase 3380 days ago

Dask would be an option.

link

mangecoeur 3379 days ago

Was going to say that. Or ipython parallel if you want to go lower level

link