Hacker News new | ask | show | jobs
by donavanm 4199 days ago
I havent read the paper yet, but Im a little surprised by the hyperloglogs. I was under the impression that HLLs break down when your symbol frequency varies by orders of magnitude. Those are exactly the patterns id expect to see in block/page access frequncies over time. Are the HLLs only tracked on a smaller temporal scale to increment the distance matrix? Or is there something else Im missing?
1 comments

The state of an HLL is completely determined by the set of distinct symbols that appear, not the order or the frequency of those symbols. So, inserting a billion A's and a single B into an HLL will have exactly the same outcome as inserting a bllion B's and a single A, or even just a single A and a single B.

Does this address your concern, or did I misunderstand your point?