Hacker News new | ask | show | jobs
by Xeoncross 1385 days ago
Can someone explain how this is laid out in memory and what it would look like serialized? It's easy to have an array of N size that fits all bits and just write to disk. You can also mmap a file as a large array of bytes and just mutate it and let the OS handle the disk syncs if you're okay with some data loss of recent state changes if the machine crashes.

What would an array of pointers to X containers which are N size each look like on disk?

2 comments

The containers are not in a contiguous memory block. To serialize it, you could write out each container as a length prefixed block with the container id, which is the 16-bit MSB of the container. The top level sorted list can be ignored, as it can be reconstructed.

To de-serialize, read each container by reading its prefix length and its container id, and then read the rest of the container by the length. The top level list is sorted by the container id. It can be reconstructed by inserting the container id with the container memory pointer into the sorted list.

Just to add a bit. The top level list can be appended to the end of the file as a directory of the pairs of id and the file offset of the container. In this way, the list at the end of the file can be read into memory to form a mapping of container-id to file offset, which can provide random access to the container blocks in the file.
This article has a section titled "how Roaring bitmaps are represented in memory". I suggest you actually read the article, it will likely answer all your questions.
I was actually asking how they were serialized and stored on disk.
The on-disk format is documented in the spec: https://github.com/RoaringBitmap/RoaringFormatSpec

It's very simple actually. I worked on adding run container support which involved adding support for the newer serialization format in roaring-rs and it proved quite elegant.