|
|
|
|
|
by verhey
2130 days ago
|
|
How does hubble compare to Great Expectations or DBT for pipeline testing? It looks like more emphasis on automated profiling than "having to write and maintain lots of individual tests" and obviously hubble being a saas offering is the big difference? Also any plans to profile and test file-based stores as well? There's a lot that can go wrong in a pipeline before data even reaches BigQuery or Snowflake, and you may help your customers save money if you could profile data in S3 before it goes through a potentially expensive transform process. Best of luck, though! Data testing is a very real need in most data organizations I've been in, and I'm glad more and more tools seem to be popping up recently to help with it. |
|
We’ve also found that keeping a history of the state of the warehouse over time is really useful context for determining whether a test has failed (example: this table tends to update every 30-40 minutes so we’ll set a threshold at an hour).
We also handle the scheduling, which is surprisingly annoying to manage (we built a couple of internal tools for this in the past). That’s something we really missed with great expectations (you get this with DBT cloud). Testing files is an interesting use case, to an extent we support this using Athena or Bigquery external tables for json/csv/parquet. We’re intentionally limiting it to SQL for now.