Hacker News new | ask | show | jobs
by vosper 1188 days ago
Would you see Buildflow as a competitor to Dagster, Flink, or Spark Streaming?

I'm about to build a pipeline that needs to pass thousands of docs a minute through a variety of enrichments (ML models, third-party APIs, etc) and then dump the final enriched doc in ES.

There are so many pipeline products and workflow engines and MLOps solutions that I'm very confused about what technologies I should be looking at. I think something looks good (Temporal) but then read it's not really for large-volumes of streaming data. Or I look at Flink that can handle massive volumes but it doesn't seem like it's as easy to wire up as other options. I think Dagster looks nice but can't find any answer (even in their Slack) about what kind of volumes it can handle...

1 comments

You can think of BuildFlow as a lightweight alternative to Flink / Spark Streaming. These streaming frameworks are great when you want to react to events in realtime (i.e. you want to trigger some processing logic every time a file is uploaded to cloud storage). Dagster is more focused on scheduling jobs, and might be a good fit if you have some batch jobs you want to trigger occasionally.

BuildFlow can run a simple PubSub -> light processing -> BigQuery pipeline at about 5-7k messages / second on a 4core VM (tested on GCP’s n1-standard-4 machines). For your case, you might be able to get away with running on a single machine with 4-8 cores.

I’d be happy to connect outside of HN if you’d like me to dig into your use case more! You can reach me at josh@launchflow.com

edit: You can also reach out on our discord: https://discordapp.com/invite/wz7fjHyrCA

Thanks for that. Sounds like it might fit what we want. I'll reach out if I have any more questions.

Are you tied to GCP services like pubsub and BiqQuery? We're in AWS, not GCP.

AWS support is in the queue, but we only have GCP services at the moment. What services on AWS do you need access to? We can move them to the front of the queue to help out.

Feel free to reach out even if this doesn’t work with your timeline. I might be able to help you come up with another solution, and I’m always interested to hear new use cases!