|
|
|
|
|
by zokier
565 days ago
|
|
> the QP can ask storage to do work like filtering, aggregation, projection, and other common tasks on its behalf. Unlike SQL designs that build on K/V stores, this allows to DSQL to do much of the heavy lifting of filtering and finding data right next to the data itself, on the storage replicas, without sacrificing scalability of storage or compute. I don't know much about DB internals, but to me this sounds like lot of the compute is getting aggregated to the storage layer. I would think that "filtering, aggregation, projection" is fairly big chunk of the computation that typical DB does? |
|
> Many SQL aggregations are monotonic operations (e.g. MAX, SUM, etc) that can be partially completed on each node and then post-merged. Some (e.g. DISTINCT) can be transformed into monotonic ops with some effort. Some aren't possible to do this way. (Ref on monotonicity: arxiv.org/pdf/1901.01930)
The benefit of this is that a lot more work is done _close_ to the data. The trend is that bandwidth is getting larger in data centers, but latency isn't improving at the same rate. Reducing the number of round trips between QP and storage greatly improves the overall query latency, even if you have to do more work on the storage.