|
|
|
|
|
by oliver101
2133 days ago
|
|
Thanks! We love DBT and take a lot of inspiration from their work. We’re putting a lot of effort into suggesting the right tests based on the data types, sources, and field names. A lot of these tests are pretty repetitive to write so we want to make it easy to spin them up. We’ve also found that keeping a history of the state of the warehouse over time is really useful context for determining whether a test has failed (example: this table tends to update every 30-40 minutes so we’ll set a threshold at an hour). We also handle the scheduling, which is surprisingly annoying to manage (we built a couple of internal tools for this in the past). That’s something we really missed with great expectations (you get this with DBT cloud).
Testing files is an interesting use case, to an extent we support this using Athena or Bigquery external tables for json/csv/parquet. We’re intentionally limiting it to SQL for now. |
|
> this table tends to update every 30-40 minutes so we’ll set a threshold at an hour
Can you achieve these tests with metadata or do you need 100% read access to the database?
I also wonder if this would work as part of a Analytics Engineering CICD process? Something like how dbt cloud will block pull requests that fail certain criteria.