Semi-related in the land of Postgres and probabilistic data structures -- Redshift supports APPROXIMATE COUNT. Much, much faster than a raw COUNT, and their stated error is +-2%
It probably uses a HyperLogLog--the 2% error rate kind of gives it away. Bloom filters approximate set membership queries, HyperLogLogs approximate set cardinality queries. COUNT DISTINCT is a set cardinality query.
Consider my metaphorical hat eaten. Thanks for the cool tools! I'm currently working with Postgres and this looks like a great thing to add to the mix.