|
|
|
|
|
by closeparen
2371 days ago
|
|
No. The tasks inside a workflow, concretely, would be things like Spark job execution, SQL query execution, download a CSV from the internet to HDFS and load it as a Hive table, etc. Think fancy cron that deals correctly with failures in multistage processes. The number of pipelines and executions is a function of the complexity of your application, and invariant of the number of records being processed by the batch jobs within those workflows. |
|