Hacker News new | ask | show | jobs
by dkopi 3741 days ago
(I'm assuming you're a 5 year old who's at least been through a course on algorithms and data-structures):

It's vaguely stated in the abstract: "Hash-Tree is a key-value multi-tree with fast indexing performance and high space utilization."

The use cases of hash trees are somewhat similar to hash tables (A map where you can quickly look up a key and get it's value), but you'd use a hash tree when you're willing to sacrifice a bit of performance for better memory usage.

"Although hashtable is very fast with O(1) time complexity, but there is always about ~25% table entries are unused, because the hash-table load factor is .75. And this htree is suitable for memory-bounded cases."

3 comments

> "Although hashtable is very fast with O(1) time complexity, but there is always about ~25% table entries are unused, because the hash-table load factor is .75. And this htree is suitable for memory-bounded cases."

Do you know htree achieves this space efficiency? I ask because it looks to me like every node has an array of children, like a standard tree. Doesn't the cost of storing one additional pointer for each key add up to at least 25% space wasted?

>Do you know htree achieves this space efficiency?

This is a new data structure to me too but if I understand it correctly, one aspect of it, if you do it right, is you don't have to store the tree at all. You just store the data in an array, and use the procedure to look up the item in the array. So the tree is virtual. Anyone who gets it, am I right about this?

You would potentially need a sentinel bit or something at each element of the array to tell you whether there is a value there. Or if all your values were non-zero (say you are storing pointers to other structures) then that is already taken care of.

In golang, array is auto-expanding so the children array allocates more space than its length, which depends on the golang internal implementation.

> "Child nodes are stored in an array orderly, and checked by binary-search but not indexed by remainders, array indexing will result redundancy entries, this is for less memory usage, with a bit performance loss. "

I just cannot find some way (such as linked list?) to use the exact space with at-least binary-search time complexity.

What is the actual space overhead of this structure, and how did you measure it?
With robinhood hashing, .90 or higher load factor is still quite practical.
Every not open addressed hash table can use load factors up to 100%, compared to 50-75%. Robin hood is a special case.

Concurrent leapfrog hashing would save the go lock and performance problems better. No need for trees when you can use a proper concurrent in-memory hash table. And you can use string keys.

Any reference for this?
Here's a discussion about load factors for Java, I'm guessing the Go implementation for HashMaps is similar:

http://stackoverflow.com/questions/10901752/what-is-the-sign...

Thank you.