Hacker News new | ask | show | jobs
by arriu 1698 days ago
Thanks for the answer! I'd love to know more :) Also, I'm not following, how you guys deal with issues with unique counts? For example, lets say you've got 100 unique visitors on Monday and 100 on Tuesday. The unique visitors for both days might be anywhere between 100-200 and averaging counts between days doesn't work.
1 comments

Not sure about this specific implementation but normally you handle this with approximations that support merging. i.e HyperLogLog You can merge 2 HyperLogLog counters to maintain proper distinct counts.
Yep!

And in fact, that's exactly what TimescaleDB supports - things like hyperloglog to support approximate count distinct, including as part of continuous aggregates. [0]

This blog post - "How PostgreSQL aggregation works and how it inspired our hyperfunctions’ design" - provides a really nice description of how our the API design of some of our analytical functions are motivated by the ability to "split" processing into the "pre-aggregation" and "finalization" steps, with the blog post focusing on the example of percentile approximation. (I think it was on HN a while back as well.) [1]

[0] https://blog.timescale.com/blog/introducing-hyperfunctions-n...

[1] https://blog.timescale.com/blog/how-postgresql-aggregation-w...

Awesome, thank you!