| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jonathankoren 2669 days ago

>At work I have what are essentially cron jobs running scripts which invoke sklearn pipelines.

If cron works for you, that’s great, and you should continue to use it. However, I would be interested to know how many data sources you have, how you handle failures in pipe segments, and your general throughput.

In more complicated flows, ones that require different different data sets to to be combined, or lots of data flows that depend on each other, moving to a DAG with event triggering is a much better setup in my experience. Data is generated faster, and errors are handled more gracefully, and recovery much faster since data is only recalculated when needed.