Hacker News new | ask | show | jobs
by iknownothow 1019 days ago
You could approach data quality testing as if you're testing another piece of software by writing tests. We use dbt and it makes writing tests against models (think tables in a db) very easy.

For example, if you have a regional_orders table. You write tests in SQL to test your assumptions about that data:

* I expect regional_orders table to contain no duplicates entries.

* I expect regional_orders to ship to only a specific region.

* So on...

This has worked fairly well so far for me. But are these kinds of tests sufficient? Am I missing something?