Hacker News new | ask | show | jobs
by philbe77 232 days ago
good point :) - we can re-aggregate HyperLogLog (HLL) sketches to get a pretty accurate NDV (Count Distinct) - see Query.farm's DataSketches DuckDB extension here: https://github.com/Query-farm/datasketches

We also have Bitmap aggregation capabilities for exact count distinct - something I worked with Oracle, Snowflake, Databricks, and DuckDB labs on implementing. It isn't as fast as HLL - but it is 100% accurate...

1 comments

I remember BigQuery had Distinct with HLL accuracy 10 years ago but rather quickly replaced it with actual accuracy.

How would you compare this solution to BigQuery?