Hacker News new | ask | show | jobs
by alexhutcheson 2355 days ago
If you're not interleaving insertions and lookups, then a sorted vector can be really good.

If you need to interleave insertions and lookups, but don't need to traverse in sorted order, then a good hash table (not std::unordered_map) is normally the best option.

You only need a tree if you need traversal in sorted order and interleaved insertions and lookups, which is pretty uncommon. Even then you are almost always better off with a B-tree than a red-black tree.

2 comments

Even with interleaved insertions and lookups, there are common scenarios that make a sorted vector still significantly more performant. I wrote about it when we published our open source SortedList<T> implementation for .NET and .NET Core [0], specifically comparing it to AVL and Red-Black trees.

[0]: https://neosmart.net/blog/2019/sorted-list-vs-binary-search-...

That's a good point, and a really interesting blog post.
Seems like these criteria miss the use case for mm_rb, one of the central red-black trees in linux.

If you need to be able to store intervals (i.e. virtual memory areas) and do lookups based on any address in the interval, not just the base address, I don't think a hash map is the best option.

IMO "lookups based on any address in the interval" requires "traversal in sorted order", although I could have probably been more precise in my terms.

I would be curious if anyone has ever profiled the impact of changing mm_rb to a B-tree. It might be very difficult if existing code that uses mm_rb depends on pointer stability, though.