|
|
|
|
|
by pitah1
879 days ago
|
|
Grats on building this out. I think there is a lot of potential in this space. I very much understand the challenges of financial/regulatory reporting and data quality :). Couple of things I have noticed. You mention "automates root cause analysis". By this I assume you mean showing which rows have affected the metrics to go out of bounds? Or is there something else I'm missing. How do users define metrics? I've found this to be a challenge especially given you may be giving this tool to non-technical users. Does this support real time data sources such as Kafka? Do you plan on supporting cross dataset validations (i.e. relationships such as an account_id for a transaction should exist in the accounts table)? |
|
- "automates root cause analysis" -> it means (1) showing which rows have affected the metrics and (2) provide some automated context (is it an update? a delete? a dimension that changed? etc). But it is still very early for 2.
- Metrics are defined by users in their usual "data" repository (using dbt for example). The metric computation is not defined on Datadrift, we only go "read it".
- No, it's really for batch processing in a data warehouse (like hourly / daily computations)
- That's not something we had in mind (I know some dbt package can help you do this)