Hacker News new | ask | show | jobs
by marcinzm 1670 days ago
If Dask doesn't consider their distributed version even on a single node (which is what we were using) to be production ready then they should label it as such.
1 comments

Do you have any citation on why "Dask doesn't consider their distributed version" to be ready? If it is your own view, then that's ok.

I think dask is in heavy usage in real production systems. Let me cite one such usage here, from Capital One (no affiliation, just referencing a big bank for 'production ready' purposes) https://www.capitalone.com/tech/machine-learning/dask-and-ra... (also not necessarily suggesting any rapids/GPU usage, you can decouple it from the article)

And note the article is from Nov 2019. Two years is a substantial amount of time for further improvements.

You post seemed to argue that Dask is fine is you stick to relatively small data that fits on a single node then switch to pandas. You also noted that this is what Dask recommends. Implication being that I ran into issues because I didn't use Dask "the right way."

I don't see how you can argue both that and that dask distributed is production ready at the same time.

I've been in big data for 15 years and was probably one of the first few thousand production hadoop users. If you think "a big company used a big data tech so it's production ready" is an argument then I've got a few bridges to sell you. A lot of companies use a lot of technologies that they spend a lot of time beating into a shape where for their specific use cases they work just well enough to not get them all fired.

In the end, it's not about an OPEN SOURCE tool being perfect but whether it is helping you solving a problem. If it did not help you and YOU don't consider it production ready, then that's fine. But you seem to argue that Dask should put this disclaimer out there. That would imply that many other open source tools including Spark would have to do it.

Dask has solved specific problems for us and we are grateful about it. I remain open minded about other choices and listed them with the understanding I have about them.

Switching to pandas when you can is going with the philosophy of keeping things simple. I like the flexibility of going back and forth between these as and when I choose.