| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by asavinov 1970 days ago

Yes, this is the basic logic: for any incremental aggregation we need to detect groups which can be influenced by this new record or updated record. If we do row-based rolling aggregation then then indeed we need to update records (i-n, i+n). Yet, the following difficulties may arise:

o Generally, we do not want to re-compute aggregates - aggregates should be also updated, particularly, if n is very large

o In real applications, rolling aggregation is performed using partitioning on some objects. For example, we append new events from many different devices to one table and want to compute rolling aggregates for each individual device. Hence, this (i-n, i+n) will not work anymore.

o Rolling aggregation using absolute time windows will also work differently. Although, if records are ordered (like in stream processing) and there are no partitions, then it is easy.

1 comments

scott_s 1967 days ago

Myself and a few others have done a lot of research on performing sliding window aggregations updates without recomputing everything. Our code is on github, and the README has links to the papers: https://github.com/IBM/sliding-window-aggregators

link