Hacker News new | ask | show | jobs
Show HN: Compressing a Histogram from 16GB to 2KB (alexkranias.com)
5 points by alexkranias 159 days ago
Thought I'd dive into this cool streaming/approximation algorithms problem I encountered a few months ago.

TLDR: we can create our own custom floating point representation for encoding and decoding integers and use this to index into a tiny 2D histogram, upper-bounding approximation error based on the number of bits per integer we specify to keep.

---

Broadly useful for aggregating statistics from massive data streams of user data. Also turns out this is incredibly similar to data structures used in production, like HDRHistogram.