| For testing: - we have a dedicated dev environment for analysts to experience a dev/test loop. None of the pipelines can be run locally unfortunately. - we have CI jobs and unit tests that are run on all pipelines Observability: - we have data quality checks for each dataset, organized by tier. This also integrates with our alerting system to send pagers when data quality dips. - Airflow and our query engines hive/spark/presto each integrate with our in-house lineage service. We have a lineage graph that shows which pipelines produce/consume which assets but it doesn't work at the column level because our internal version of Hive doesn't support that. - we have a service that essential surfaces observability metrics for pipelines in a nice ui - our airflow is integrated with pagerduty to send pagers to owning teams when pipelines fail. We'd like to do more, but nobody has really put in the work to make a good static analysis system for airflow/python. Couple that with the lack of support for column level lineage OOTB and it's easy to get into a mess. For large migrations (airflow/infra/python/dependecy changes) we still end up doing adhoc analysis to make sure things go right, and we often miss important things. Happy to talk more about this if you're interested. |