| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gopalv 4602 days ago
	HyperLogLog based COUNT DISTINCT APPROXIMATE, that's an interesting development in there (simple algorithm, but complex math). I wonder what hashes they are using.

2 comments

dmunoz 4602 days ago

Only tangentially related to your comment, but last year there was a series of postings "Damn Cool Algorithms" in which the last post covered HyperLogLog in the post on cardinality estimation [0]. Some discussion in the HackerNews submission [1].

[0] http://blog.notdot.net/2012/09/Dam-Cool-Algorithms-Cardinali... [1] https://news.ycombinator.com/item?id=4488946

link

vtuulos 4602 days ago

one of the biggest benefits of HLL is that if you are only interested in set cardinalities, you don't have to store the original items at all, just a small HLL data structure which can be used to compute set unions, and with some caveats, intersections.

It would be really great if Redshift supported HLL as a proper data type, and not just an optimization for COUNT(). A proper data type would allow us to store pre-aggregated data instead of billions of unnecessarily granular rows.

link