Hacker News new | ask | show | jobs
by sails 2131 days ago
Very interesting tool, I am trying to do this with Dataform/Looker, and feel like some kind of inference like below would be great.

> this table tends to update every 30-40 minutes so we’ll set a threshold at an hour

Can you achieve these tests with metadata or do you need 100% read access to the database?

I also wonder if this would work as part of a Analytics Engineering CICD process? Something like how dbt cloud will block pull requests that fail certain criteria.

1 comments

Metadata is a valuable place for finding information like load times, rows inserted / updated. Currently we just rely on read-access and raw SQL. A common way users are doing this now (and we are internally for our analytics data) is using, for example, the Fivetran logs table to monitor ingestion times and inserted rows, rather than querying the raw tables.

For CICD, absolutely we want to support this as well as stopping/conditional execution in DAGs (e.g. airflow). We’re launching webhooks very soon