We don't support any snapshotting or checkpointing directly in BuildFlow at the moment, but these are great features we should support.
But we do have some fault tolerance baked into our I/O operations. Specifically for Google Cloud Pub/Sub the acks don't happen until the data has been successfully processed and written to the sink, so if there is a bug or some transient failure the message will be resent later depending on your subscriber configuration.
All of our processing is done via Ray (https://www.ray.io/). Our early benchmarks are about 5k mesesages per second on a single 4 core VM, but we believe we can increase the with some more optimizations.
This bench mark was consuming a Google Cloud Pub/Sub stream and outputting to BigQuery.
I don’t see anything on snapshotting or checkpointing like Flink. Is this just for stateless jobs?