Hacker News new | ask | show | jobs
by jstimpfle 2784 days ago
You can implement most things with dynamically allocated vectors. Just use indices instead of pointers to link the elements. This can also bring advantage in space efficiency if you're able to do with 4 byte indices instead of 8 byte pointers.
4 comments

I tend to use C++'s std::deque as a poor man's arena allocator. I find it works even better than vectors with indices: indeed C++ says pushing at the end will validate iterators but not pointers to the elements themselves. This gets rid of the infrequent resizing in vectors.
I think you have a typo in there, "validate iterators" -> "invalidate iterators".

It should be easy to build a non-invalidating version of the iterator that has a pointer to the chunk and an offset within that chunk. That's more like 12 or 16 bytes, though.

What I like about vectors is that an index is not really about the storage that the vector holds at that position. The index is a mathematical identity of an object. It's the object itself. The vector is only a materialized function that holds some attribute for each object. The nice thing is we can have multiple independent vectors for each object type. This is true modularity.

We can get an address from a simple index with deques, too. But only with significant indirection. So I think if (1) no movement cost on reallocation is allowed, and/or (2) stable pointers are needed, deques are a fine choice. But vectors are my first choice.

Given that we have huge address spaces nowadays we could also consider preallocating large global vectors at stable addresses, and expanding them with mmap(). This way the copying when growing the vector is not needed.

Yes, that's how works the RB-tree example I linked. However, it is much harder doing that for implementing tries, except if you work only with the full node case, which would imply wasting a huge amount of space. At least, from my experience.
You can also use vector of vectors. Reallocating would be cheaper. But access would incur extra indirection.
You can consider the entire virtual memory space to be a big vector, where a pointer is just an index into it!
However that's a pretty poor vector. It's not homogeneous (you put things of different shapes inside). That also implies you need sophisticated memory management, leading to further overheads. You cannot meaningfully iterate it.
I was mostly joking…
Yes and I'm serious :-)
Yes, that's exactly how we do it in Fortran 77.