Hacker News new | ask | show | jobs
by falcolas 3609 days ago
> the best-performing data structure for looking up sorted keys cannot do those queries faster than O(log(n))

Feels like a digression off the actual tree structure, but isn't this incorrect, since a hash map can do key lookups in O(1)? With caveats, of course.

It's a great idea over all, but I'd also be curious how well overlaying an event log on a cuckoo hash would work in comparison.

4 comments

A hashmap isn't sorted keys. But 'O(log(n))' is still incorrect. The fastest you can look up information physically represented in a 3D universe is 'O(cube-root(N))'. That applies to radom access memory and hashmaps, too. Cf. Myth of RAM [1].

[1] http://www.ilikebigbits.com/blog/2014/4/21/the-myth-of-ram-p...

Creator here. As other commenters noted, this data structure only requires keys to be sortable, not hashable, and the best sorted performance we can achieve is O(log(n)).

That said, when a Cuckoo hash gets very full and bounces entries around a lot, there might be an advantage to buffering operations and choosing insertion patterns that reduce the batch's insertion time. Then again, Cuckoo hashes already perform so well for situations they're designed for, so it's hard to improve them with an event log overlay.

It's a common misconception that hashmaps have constant time lookups. In order for the lookup to be constant time, the hash function must output enough bits to avoid causing too many collisions. For example, 16 bits is too little for a hash map with a million keys. In fact, the hash function output size should be O(log(n)) where n is the number of entries. Processing log(n) bits takes at least O(log(n)) time. QED
> Processing log(n) bits takes at least O(log(n)) time.

No, because log(n) might be smaller than the word size of the machine, which is the common case for hash table implementations.

If you limit log(n) and thus n to a constant, like the word size of the machine, then every algorithm is constant time.
No, I am not forcing all n of any size to a constant, I'm saying that we can bottom out inductively for small n.

This is a very old discussion. Read Knuth's rationale. Then read the similar sections in Segwick et all or whatever other basic competence textbook on algorithms you like. Compare them, think about which contexts each approach is most useful in.

In other words, no, you have not somehow cleverly invalidated all of complexity analysis.

When log(n) is within the word size the hashmap lookup for the most part is actually constant i.e., few specific machine operation. A generic log(n) algorithm still requires log(n) distinct operation for the most part however small that number is.
keyword here is sorted. hash map is unordered