Hacker News new | ask | show | jobs
by mooncitizen 1260 days ago
Can you please put an example about dask being more flexible than spark?
2 comments

This https://docs.dask.org/en/stable/spark.html notes "However, Dask is able to easily represent far more complex algorithms and expose the creation of these algorithms to normal users [compared to spark]" linking to: http://matthewrocklin.com/blog/work/2015/06/26/Complex-Graph...
Yes, say you’ve got imaging data in 3 (or higher) dimensions as numpy arrays and want to run some sort of algorithm on multiple cores/machines. Could be both for data analytics and simulations.

dask.bag has generic parallel processing capabilities. Query a database, a rest api, something. Then merge into dataframes across dask workers.