Hacker News new | ask | show | jobs
by insanitybit 974 days ago
Well if your integers are sequential you can encode huge numbers of them using diff + RLE in just a few bytes, likely far fewer than 1/2 a byte on average, for the right dataset (in theory you can store 1,2,3,4,5...10_000 in 2 bytes).

But for other integer datasets there's FastPFOR

https://github.com/lemire/FastPFor

The linked papers there will talk about techniques that can be used to store multiple 32bit integers into a single byte, etc. Integer compression is pretty powerful if your data isn't random. The thing with UUIDs is that your data is pretty random - even a UUIDv7 contains a significant amount of random data.

1 comments

Also had great success with Roaring Bitmaps, a bit further on the compression/ease of manipulation axis.

Edit: I see you're already linking to Daniel Lemire's github, so I guess you're familiar with them, just mentioning them, since it's in a comment thread like this one that I discovered them.