| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by samsaga2 1247 days ago
	I always listened that unordered_map is more cache friendlier than map. My intuition tells me unordered_map is better to iterate, and map is better to random access.

3 comments

powersnail 1247 days ago

Unordered map is a hash map. Map is a binary search tree.

There’s no random access for hash map, in the sense of “give me the fifth element”. If you mean a lookup, like map[key], hash map has an O(1) amortized look up, where as a binary tree has an O(log n) look up, so hash map will generally be faster in this regard.

link

tialaramex 1247 days ago

By "iterate" I expect they mean to just look at all the key/value pairs in the map. Since it says "Unordered" right in the name they don't care what order they see the elements, just that they see them all.

This is easy to do in all sane hash map designs, and is very fast in a linear design like Abseil's Swiss Tables.

Also sadly C++ std::unordered_map is guaranteed to be not just a hash table but an open hash table, using buckets to keep all the stuff which collided together. This is probably not what you wanted, but too bad that's what was standardized.

link

masklinn 1247 days ago

> is very fast in a linear design like Abseil's Swiss Tables.

It's not that fast, because branching is completely unpredictable when iterating the sparse array (though a very high load factor helps). That's why a few modern hash tables trade an additional indirection on lookup by storing entries in a dense array and making the sparse array an array of indices into that.

It makes iterating the map as efficient as iterating an array (or zipping two depending on data layout), it also saves some memory (especially if the indices array is adaptive), and has the neat side-effect of making the map conserve insertion order (though deletion is a bit of a concern, tradeoffs may have to be made with the other advantages).

link

tialaramex 1247 days ago

Good point about the branch cost.

For a general purpose hash map, I think promising to preserve order despite deletion (like Python's current dict) is a mistake in performance terms†. Python could afford a penalty here because their previous dict was so ludicrously bad that you won't notice. We can't know if the user of a general purpose hash map does 0% deletes or 50% deletes nor whether the ordering is valuable to them at all anyway.

† If we don't promise to preserve order then delete just swaps the removed item from the dense array with the last item, then removes the last item, very efficient, if we insist on preserving order we're either using tombstones which mean branches or we're shuffling half the bloody array for each deletion.

link

sicp-enjoyer 1247 days ago

This is quite unfortunate. The way to get a really good hash table is by enforcing a bunch of simplifying assumptions (power of 2 sizes, sentinel values, etc). But the C++ committee has to make the "one true table" for everyone.

std::map actually seems to fit this role better. It works reasonable well for many workloads and types without tuning a bunch of parameters.

link

sicp-enjoyer 1247 days ago

> so hash map will generally be faster in this regard.

Complexity is not a measure of runtime. The performance drawbacks for std::map have to do with cache, not O(log) vs O(1). log(billion) is 30. And that's exactly how many comparisons you have to do, not a constant multiple of that.

With an open addressing hash table, who knows how many steps you will need to find an element. It's even more questionable if it does linked list chaining.

> binary search tree.

It's usually a red black tree and maybe could be a b-tree? "Search tree" is accurate.

link

masklinn 1247 days ago

> maybe could be a b-tree?

IIRC the complexity requirements of std::map preclude a b-tree (or require a binary tree). Though I don't remember the exact reasoning.

abseil's website makes the same claim, though also without justification:

> B-trees have a different implementation from STL std::map containers, which require binary trees commonly implemented as red-black trees.

link

powersnail 1246 days ago

> It's usually a red black tree and maybe could be a b-tree? "Search tree" is accurate.

I mean, I haven’t write red black trees for a while, but it is a type of binary search tree, as far as I remember, isn’t it?

link

sicp-enjoyer 1246 days ago

You are correct. Sorry if I am being pedantic. I was also trying to suggest it could be a b-tree but others have pointed out that it doesn't quite fit the spec.

link

masklinn 1247 days ago

> My intuition tells me unordered_map is better to iterate, and map is better to random access.

map is definitely not better for random access, with the possible exception of adversarial input: IIRC std::map pretty much requires an rbtree so you get an O(log2(n)) access, while unordered_map is a hashmap so has an O(1) best case but O(n) worst case.

unordered_map might have a better cache friendliness (though that's debatable as IIRC the standard requires closed-hashing), however iterating it is a long series of unpredictable branches (each bucket can be empty or full and the better your hash function the more random it is), std::map requires a lot of pointer-chasing instead but it's all completely predictable.

link

jeffbee 1247 days ago

Map is unfriendly in several ways because of the nodes being allocated with new by default. You can significantly improve its behavior by constructing the map with an arena allocator. Then you will enjoy better cache locality and cheaper mutations.

link