Hacker News new | ask | show | jobs
by usman-m 3969 days ago
It probably uses a HyperLogLog--the 2% error rate kind of gives it away. Bloom filters approximate set membership queries, HyperLogLogs approximate set cardinality queries. COUNT DISTINCT is a set cardinality query.

We actually support a HyperLogLog backed COUNT DISTINCT aggregate too: http://docs.pipelinedb.com/aggregates.html#general-aggregate...

1 comments

Consider my metaphorical hat eaten. Thanks for the cool tools! I'm currently working with Postgres and this looks like a great thing to add to the mix.
There's a postgres extension that implements hll for postgres. Rather useful: https://github.com/aggregateknowledge/postgresql-hll