Hacker News new | ask | show | jobs
by andywarfield 4199 days ago
Thanks! We're pretty early in our use of HLLs within the system but are already managing to get some really cool data off of them. I'm excited to see where this all goes as we build out the system over the next year.

Your last-accessed time point is exactly right. Storage systems used to be able to do a lot with file system-level metadata, but with the size and opaqueness of VM image files, those techniques have become a lot less effective. We're currently exploring how we can use HLLs in combination with a couple of other techniques to do things like clustering co-accessed data and then managing operations like prefetching and demotions over much longer time frames than are typically done in OSes and storage systems.

1 comments

I havent read the paper yet, but Im a little surprised by the hyperloglogs. I was under the impression that HLLs break down when your symbol frequency varies by orders of magnitude. Those are exactly the patterns id expect to see in block/page access frequncies over time. Are the HLLs only tracked on a smaller temporal scale to increment the distance matrix? Or is there something else Im missing?
The state of an HLL is completely determined by the set of distinct symbols that appear, not the order or the frequency of those symbols. So, inserting a billion A's and a single B into an HLL will have exactly the same outcome as inserting a bllion B's and a single A, or even just a single A and a single B.

Does this address your concern, or did I misunderstand your point?