Hacker News new | ask | show | jobs
by afranchuk 2328 days ago
A BTreeMap should typically have O(n) memory usage, whereas a HashMap (depending on load factor) will usually have O(kn) memory usage, where k > 1. This is because a HashMap allocates the table into which it will store hashed values upfront (and when the load is too great), so it can't anticipate how many values may be added nor what sorts of collisions may occur at this time. Yes, collisions are typically stored as some allocate-per-item collection, but the desire of a HashMap is to avoid such collisions. A BTreeMap allocates for each new value.

Note that this explanation is a bit handwavy, as both data structures have numerous optimizations in production scenarios.

2 comments

There is no difference between O(n) and O(kn), if k is a constant. The notation deliberately ignores constant factors. (That's why you can say a BTreeMap requires O(n) memory independent of the size or type of data being stored, provided there is some finite upper bound on the sizes of the keys and values.)
Yeah I know, it was just the fastest way to indicate that the constant factor was almost definitely larger for HashMaps. But thank you for clarifying!
> collisions are typically stored as some allocate-per-item collection

Rust's HashMap stores the collisions in the same table as the non-collisions (open addressing), not in a separate collection.

This is true, thanks for the specifics. I was answering the question from a more generic perspective, but failed to mention that many implementations rehash on collision...