Hacker News new | ask | show | jobs
by nhaehnle 4010 days ago
computing the hash of a string key is typically an O(n)

This is a good point, though to be fair, this aspect of it is usually also ignored when analyzing the main alternative data structures, which are balanced binary trees. At least every successful lookup in such a tree with strings as keys also has to read the entire string.

This means that hashes probably still have an advantage over balanced binary trees. Tries, on the other hand, could really beat hashes when the keys are strings. In practice, they might have a cache and branching disadvantage, though.

1 comments

Actually, modern tries can have good cache behavior, especially at larger sizes. Hash maps inherently store data haphazardly inside themselves while tries keep it in order with similar keys near each other in memory.

At the very least, this makes it easier for your algorithms to be more cache friendly, because tries' cache behavior is predictable in a way that hash maps' isn't.

This is even more relevant for branch prediction: hash functions (by design) are harder to predict, giving the lookup algorithm of a trie an advantage.

A cache aware design like "adaptive radix trees" delivers really good performance in practice, comparable or better than hash maps while also supporting fast iteration and certain queries (like iterating over all keys with a given prefix).

Take a look at the adaptive radix tree paper[1]: the idea is very accessible, and the resulting performance very impressive. They also generalize the system to work with different types of keys through a systematic method for turning values into binary strings.

[1]: http://www3.informatik.tu-muenchen.de/~leis/papers/ART.pdf