|
|
|
|
|
by 727374
2306 days ago
|
|
I went down the Bloom Filter rabbit hole for a project a while back and then wondered if I could actually fit the entire set space in memory. The set was IDs up to 2^32, so basically a giant bit array. I believe the IPV4 universe of this article is actually the same size as in my project. I coded up my project using Java bit arrays and got things working decently well, using a big heap. Then I found out there are a bunch of compressed bit array libraries such as EWAH and Roaring Bitmap. When I substituted Roaring for the stock Java Bitset implementation, I saw space and computation improve by many orders of magnitude. Roaring uses a bunch of tricks to achieve this, but it mostly comes down to encoding large runs of 1s or 0s. Obviously, not as tiny as bloom filters, but still pretty small for most modern machines if you have a sparse set. https://www.roaringbitmap.org/ |
|