Hacker News new | ask | show | jobs
by whytai 1065 days ago
For testing:

- we have a dedicated dev environment for analysts to experience a dev/test loop. None of the pipelines can be run locally unfortunately.

- we have CI jobs and unit tests that are run on all pipelines

Observability:

- we have data quality checks for each dataset, organized by tier. This also integrates with our alerting system to send pagers when data quality dips.

- Airflow and our query engines hive/spark/presto each integrate with our in-house lineage service. We have a lineage graph that shows which pipelines produce/consume which assets but it doesn't work at the column level because our internal version of Hive doesn't support that.

- we have a service that essential surfaces observability metrics for pipelines in a nice ui

- our airflow is integrated with pagerduty to send pagers to owning teams when pipelines fail.

We'd like to do more, but nobody has really put in the work to make a good static analysis system for airflow/python. Couple that with the lack of support for column level lineage OOTB and it's easy to get into a mess. For large migrations (airflow/infra/python/dependecy changes) we still end up doing adhoc analysis to make sure things go right, and we often miss important things.

Happy to talk more about this if you're interested.