Hacker News new | ask | show | jobs
by theptip 1417 days ago
Thanks for the experience report - I have Dagster and Prefect on my shortlist to evaluate next time I need to build this, and Dagster seems the most promising, so it’s good to get another datapoint.

One Q - it seems to me that another possible solve (and probably how the big guys tend to do it) is to use a dataflow engine like Spark/Flink. Did you compare a managed platform like Google Dataproc? They also have serverless if you don’t want a heavy managed cluster, which might make this approach more viable for non-huge companies that wouldn’t utilize a min-spec cluster. (When I last evaluated this they didn’t have serverless which was a dealbreaker for my small scale).

1 comments

We didn't look into a dataflow engine specifically, in part because we have a heterogeneous set of workfloads. Our core use case is loading mission critical data in chunks, but it is also coordinating SaaS tools and managed services like Sagemaker. So the sort of "just run this arbitrary code" reliably and scalably is an important role in our case, not just the dataflow part of things.