| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by closeparen 2371 days ago
	No. The tasks inside a workflow, concretely, would be things like Spark job execution, SQL query execution, download a CSV from the internet to HDFS and load it as a Hive table, etc. Think fancy cron that deals correctly with failures in multistage processes. The number of pipelines and executions is a function of the complexity of your application, and invariant of the number of records being processed by the batch jobs within those workflows.