|
|
|
|
|
by insanitybit
974 days ago
|
|
Well if your integers are sequential you can encode huge numbers of them using diff + RLE in just a few bytes, likely far fewer than 1/2 a byte on average, for the right dataset (in theory you can store 1,2,3,4,5...10_000 in 2 bytes). But for other integer datasets there's FastPFOR https://github.com/lemire/FastPFor The linked papers there will talk about techniques that can be used to store multiple 32bit integers into a single byte, etc. Integer compression is pretty powerful if your data isn't random. The thing with UUIDs is that your data is pretty random - even a UUIDv7 contains a significant amount of random data. |
|
Edit: I see you're already linking to Daniel Lemire's github, so I guess you're familiar with them, just mentioning them, since it's in a comment thread like this one that I discovered them.