Hacker News new | ask | show | jobs
by nattaylor 1381 days ago
I enjoyed this walkthrough. I'm interested in RB because it's used in Lucene but never dug in and assumed (without much thought) that it had to do with consecutive runs of 1s and 0s. Wrong!
1 comments

Compressing consecutive 1's and 0's was the first idea that popped into my head as well. Then when they started to talk about inserting into the 800,000th position my brain went "What if they just reversed the array and stored some metadata as the type of array they're dealing with?"

It's funny that in the end Bitmaps essentially are a modified data structure and process.

The way things are going, I imagine someone will at some point take all the different tricks we have of inserting, sorting, finding, deleting data, take millions of datasets and run some machine learning type process to create libraries that can perform these operations optimally.

I was thinking a similar thought that 4096 seemed quite magic (although it seems to be chosen based on research) and that RB perf could probably be tuned based on the workload
I don't think 4096 is arbitrary. It's an array of 16bit integers, and 4096 * 16 = 65536. So 4096 represents the boundary point below which an array of integers uses less memory than a traditional bitmap.
Oops, thanks for pointing that out