Hacker News new | ask | show | jobs
by shultays 1934 days ago
Imho "but technically..." is not a valid opinion while in practice the access is O(1) on average. Yea sure, it becomes linear if your hash is "return 42;", it can't grantee that you supplied a good hasher.
4 comments

It's not technically. Worst-case guarantee is required for some applications. Hash Maps have better amortized complexity but some implementations have bad worst case time.

Java uses balanced trees instead of linked list in their chained-hashtable implementation, if I recall correctly.

There are lots of reasons to prefer trees (and, correspondingly, lots of reasons to prefer hashtables); I just pointed out ordering isn't the only one, and I merely gave another (worst-case performance guarantee). For example, yet another one is iterator stability; insertion can invalidate iterators for unordered containers, making them useless for some applications that need to analyze data efficiently as they insert.

I could go on, but it's very much detracting from the point of the paper to argue about the data structure choice when the paper isn't on this topic at all or trying to analyze any particular application to begin with.

Technically is a very valid opinion if you're writing software that needs to be robust and secure in the face of user input. std::set is guaranteed to be logarithmic strictly in the number of comparisons in the worst case without exception.

std::unordered_set has some very difficult to reason about worst case scenarios that must factor in both the size of the collection and the hash function. Unless you are very careful about the type of hashing you use, you can easily be vulnerable to DDOS attacks that scale on the order of O(k * n) for key size k and container size n.

Note that this isn't just some user-supplied bad hashers. Hash for std::string in GNU library is not much harder to hack for collisions than your example. And also it uses the whole string, so is slow on long strings. I didn't investigate if other implementations of STL are any better.