Hacker News new | ask | show | jobs
by bradford 777 days ago
> I don't think this is a particularly insightful article.

Data engineering can be lonely. I like seeing the approach that others are taking, and this article gives me a good idea of the implementation stack.

2 comments

After reading this I suggest having a look at apache beam if you are not using it already. I have the feeling that you can achieve the same with way fewer elements in the stack.

Also, were you to decide to run it on another "runner".

Additionally, you can truly reuse your apache beam logic for streaming and batch jobs. Other tools perhaps can do that, but from some experiments I ran some time ago it's not as straightforward.

And finally, if one or more of your processing steps need access to GPUs you can request that (granted that your runner supports that: https://beam.apache.org/documentation/runtime/resource-hints... ).

Same here. I just wished OP pointed to an example repo with a minimum working example.