Hacker News new | ask | show | jobs
by gopalv 4602 days ago
HyperLogLog based COUNT DISTINCT APPROXIMATE, that's an interesting development in there (simple algorithm, but complex math).

I wonder what hashes they are using.

2 comments

Only tangentially related to your comment, but last year there was a series of postings "Damn Cool Algorithms" in which the last post covered HyperLogLog in the post on cardinality estimation [0]. Some discussion in the HackerNews submission [1].

[0] http://blog.notdot.net/2012/09/Dam-Cool-Algorithms-Cardinali... [1] https://news.ycombinator.com/item?id=4488946

one of the biggest benefits of HLL is that if you are only interested in set cardinalities, you don't have to store the original items at all, just a small HLL data structure which can be used to compute set unions, and with some caveats, intersections.

It would be really great if Redshift supported HLL as a proper data type, and not just an optimization for COUNT(). A proper data type would allow us to store pre-aggregated data instead of billions of unnecessarily granular rows.