| HN Mirror

You're talking about more debugging tools within the transformation steps of a pipeline, right? dbt is helping with that via data tests (see https://gitlab.com/meltano/analytics/tree/master/elt/dbt/tes...) as an example.

I'd love for a more robust way to test data pipelines and the data within them generally. I was at DataEngConf earlier this year and many people were talking about this problem exactly. One way we're trying to address it a bit is by using the Review Apps feature on Merge Requests within GitLab. Right now, when you open an MR on our repo it will create a clone of the data warehouse that's completely isolated from production. This, obviously, can't scale once the DW is beyond a certain size, but I think there are ways to keep this sort of practice going.