| HN Mirror

After reading this I suggest having a look at apache beam if you are not using it already. I have the feeling that you can achieve the same with way fewer elements in the stack.

Also, were you to decide to run it on another "runner".

Additionally, you can truly reuse your apache beam logic for streaming and batch jobs. Other tools perhaps can do that, but from some experiments I ran some time ago it's not as straightforward.

And finally, if one or more of your processing steps need access to GPUs you can request that (granted that your runner supports that: https://beam.apache.org/documentation/runtime/resource-hints... ).