|
|
|
|
|
by Nican
1938 days ago
|
|
Thank you for taking your time to write that up. To illustrate my point a little more: Suppose a customer has several resources, and each resource has several metrics. From my understanding, Materialized could be used to have an aggregated view of metrics per customer. The problem is that resources can also be migrated between customers. When a resource migrates between customers, the whole history of the customer changes. This could cause huge updates depended on how many resources are moved, or how many metrics per resource are being collected. I have a conundrum between doing the "customer-resource join" late, and causing huge CPU cost when running queries. Or making aggregates early, and then having huge Disk cost when migrating resources. At the moment, we just have daily jobs that aggregates the TBs of customer data daily, because there is no way to do the joins in real-time. Is Materialize designed to be able to handle something like this? |
|
> When a resource migrates between customers, the whole history of the customer changes.
I'm a bit confused about this part. In the query I posted above, Differential's incremental join algorithm would spare you from recomputing the entire join. (As written the query would have to re-aggregate the metrics associated with that resource that moved, but that could be solved by manually pushing the avg down.)
If you'd like to drop by our Community Slack (https://materialize.com/s/chat) or GitHub issues (https://github.com/MaterializeInc/materialize/issues) we'd be happy to talk through things in more detail!