Hacker News new | ask | show | jobs
by gberger 2069 days ago
What do you recommend for distributed data processing?
2 comments

Dask is a great alternative for distributed computing as well: https://github.com/dask/dask

IMO, Spark is better for some tasks and Dask is better for others.

First step is decide if you really need distributed data processing. I think this is the point author is making. I've seen GB sized data considered "BIG DATA" and its unbelievable the architectural patterns used to support this "BIG DATA".