| If you've ever used a tool similar to HP Operations Orchestration. It's pretty much a stripped down version of that. Cons: - It's not very stable. (It requires a lot of configuration to get it to do more than one process at a time) - It's very easy to get the UI to fall over. - It's very difficult to get tasks+jobs to stop running once they started. (You can delete/stop/cancel a job.. but under the covers it keeps running and your next iteration is going to wait.. if it ever does complete before you can go through the develop, test cycle again) - It's written in Python: Expect to have issues with your environment. The latest version of Airflow doesn't work with 3.7.xish because the async word was made a keyword. There goes that method. - There is no sharing (xcoms is frowned upon) of data from one process to another. This means that if you're trying to pull data from S3, you're going to have to hard code it to a predictable place. The next operation acts completely indepenently and runs that. - The connections between tasks are superficial. They're just there to order it based on how you specified it. Also, it can be a bit difficult to debug when you have multiple layers and multiple depedency declarations where something is both a upstream and downstream of the same depedency. - No optimization. It will not split up the work per task. You have to define that work manually. (See the next complaint) - No Dynamic tasks or Dags. You cannot generate a new dag or task after the dag is initialized. That means that if you have to perform 1 000 000 000 000 API calls, you can't just break that up into 200 api calls per task and then max out your compacity in your workers. - That example that they had of a dag of thousands of tasks. That's a bad practice. Timeouts on dags are going to be reached by the time that completes, and it'll try to restart on a schedule. |