| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by matsur 3973 days ago
	Semi-related in the land of Postgres and probabilistic data structures -- Redshift supports APPROXIMATE COUNT. Much, much faster than a raw COUNT, and their stated error is +-2% http://docs.aws.amazon.com/redshift/latest/dg/r_COUNT.html

1 comments

striking 3973 days ago

I'd eat my hat if it didn't use a Bloom filter in some way.

link

usman-m 3973 days ago

It probably uses a HyperLogLog--the 2% error rate kind of gives it away. Bloom filters approximate set membership queries, HyperLogLogs approximate set cardinality queries. COUNT DISTINCT is a set cardinality query.

We actually support a HyperLogLog backed COUNT DISTINCT aggregate too: http://docs.pipelinedb.com/aggregates.html#general-aggregate...

link

striking 3972 days ago

Consider my metaphorical hat eaten. Thanks for the cool tools! I'm currently working with Postgres and this looks like a great thing to add to the mix.

link

anarazel 3972 days ago

There's a postgres extension that implements hll for postgres. Rather useful: https://github.com/aggregateknowledge/postgresql-hll

link