| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by neutralino1 1450 days ago

> - How does it handle failure of individual tasks in the pipeline? At this time there are no handling of failures (Sematic is 6 weeks old :). In the near future we will have fault tolerance mechanisms: retries, try/except.

> - What if the underlying jobs need to run outside the k8s cluster? You are free to launch jobs on third-party platforms from one of your pipeline steps. This is a pretty common pattern, for instance launching a Spark job, or a training job on a dedicated GPU cluster. In this case, the pipeline step that launches the job (the Sematic function) needs to wait for the third-party job to complete, or pass a reference to the job to a downstream step that will do the waiting.

> - How does caching work? At this time there is no caching (as mentioned Sematic is very new :). We will implement memoization soon. What you can do is run a data processing pipeline separately and then use the generated dataset as input to other pipelines. This is a pretty common pattern: having a number of sub-pipelines (e.g. a data processing loop, a train/eval loop, a testing/metrics loop, etc.) that you can run independently, but also you can put them together in an end-to-end pipeline for automation. Sematic lets your nest pipelines in arbitrary ways, and each sub-pipeline can still have its own entry-point for independent execution.