|
|
|
|
|
by millisecond
4417 days ago
|
|
HLL also has two nice real-world optimizations possible depending on use-case. We're storing 100,000+ unique counters, but only around 1% have more than 100 unique objects counted. Some of those 1% have millions of records so HLL is very useful. As the HLL itself is a fixed size (~10kb for decent accuracy) regardless of #counted objects, in the small case you can replace the HLL with a pure set of counted values and produce a HLL when it grows beyond a bound. Because you're storing the raw values, the transition to HLL is seamless. Once you've moved beyond raw storage of values there's a harder but still space-saving technique. If you look at the raw bytes of a ~10kb HLL structure with "only" 10's of thousands of counted values around 90% of them will be zero. Below a certain bound it can save a lot of space to have a map of locations and non-zero byte values rather than a raw array of bytes. |
|