Hacker News new | ask | show | jobs
by teraflop 2801 days ago
I had a similar confusion when skimming the paper. The algorithm only depends on the relative ordering of node "ranks", rather than their absolute values. If you were to treat the distributions as continuous (ignoring ties) then it would make absolutely no difference what distribution you used. (Any two continuous distributions are related by a monotonic transformation, which preserves the ordering relationships between different points of the distribution.)

(A more concrete way of looking at it: a simple way of sampling from a geometric distribution is to choose a uniform random integer and count the number of leading 1 bits. The authors suggest concatenating a few random bits to the end for tie-breaking, so that the rank is a pair of a geometric random variable and a uniform random variable, sorted lexicographically. But such a pair is equivalent to just choosing a uniform random integer, and sorting by that!)

Upon a closer reading, it makes a bit more sense. In section 4, the authors point out that by choosing ranks from a geometric distribution, they only need to store O(log log n) bits per node, instead of O(log n). The geometric distribution of ranks, and the handling of ties, allow them to prove the same time complexity bounds as a normal treap while using slightly less space.

However, I don't see this as a major advantage because either way, it still takes O(log n) bits to store each node's child pointers.