| HN Mirror

In my benchmarks, HyperLogLogs unilaterally outperform MinHash* by a constant factor, while Bloom Filters of the same size become more accurate than hyperloglogs in the 100-500kB range (though this number increases with the cardinalities of sets being compared). The biggest advantage a HyperLogLog has over MinHash and bloom filters is that it works well regardless of the sizes of the sets being compared, while MinHash is sensitive to differences in set size and bloom filters perform terribly in the small sketch domain. (These results aren't yet public, but they will be soon.)

* Part of this might be issues with the original Flajolet et al. estimation algorithm. Ertl provides several improved methods in https://arxiv.org/pdf/1702.01284.pdf.