|
|
|
|
|
by whytai
1060 days ago
|
|
How do you guys do the static analysis on the queries? I notice you support dbt, bigquery etc, but all of our companies pipelines are in airflow. That makes the static analysis difficult because we're dealing with arbitrary python code that programmatically generates queries :). Any plans to support airflow in the future? Would love to have something like this for our companies 500k+ airflow jobs. |
|
More generally we can embed the transformation logic of each stage of your data pipelines into the edge between nodes (like two columns). Like you said, in the case of SQL there are lots of ways to statically analyze that pipeline but it becomes much more complicated with something like pure python.
As an intermediate solution you can manually curate data contracts or assertions about application behavior into Grai but these inevitably fall out of sync with the code.
Airflow has a really great API for exposing task level lineage but we've held off integrating it because we weren't sure how to convert that into robust column or field level lineage as well. How are y'all handling testing / observability at the moment?