|
|
|
|
|
by vosper
1188 days ago
|
|
Would you see Buildflow as a competitor to Dagster, Flink, or Spark Streaming? I'm about to build a pipeline that needs to pass thousands of docs a minute through a variety of enrichments (ML models, third-party APIs, etc) and then dump the final enriched doc in ES. There are so many pipeline products and workflow engines and MLOps solutions that I'm very confused about what technologies I should be looking at. I think something looks good (Temporal) but then read it's not really for large-volumes of streaming data. Or I look at Flink that can handle massive volumes but it doesn't seem like it's as easy to wire up as other options. I think Dagster looks nice but can't find any answer (even in their Slack) about what kind of volumes it can handle... |
|
BuildFlow can run a simple PubSub -> light processing -> BigQuery pipeline at about 5-7k messages / second on a 4core VM (tested on GCP’s n1-standard-4 machines). For your case, you might be able to get away with running on a single machine with 4-8 cores.
I’d be happy to connect outside of HN if you’d like me to dig into your use case more! You can reach me at josh@launchflow.com
edit: You can also reach out on our discord: https://discordapp.com/invite/wz7fjHyrCA