Hacker News new | ask | show | jobs
by mabbo 2795 days ago
Consider the humble array-list- a simpler example to be sure, but it's great for explaining.

Build an array of an initial basic size (16 let's say), and keep a counter of the current size. When you insert items 0 through 15, the time to insert is O(1)- find the current size, insert there. When you insert item 16 however, you need to make a new array of size (currentSize2), move the existing items over to it, then add your new item- which is O(n).

Let's say we insert n items (be in 64, 1024 or 2^32, it doesn't matter). What was the mean, median and mode for an insert?

Well, for any non-doubling case (1-k/k) it took O(1). For any doubling case (1/k of the time) it took O(k). This all adds up to a mode of O(1), a median of O(1) and a mean of (O(1)(k-1) + O(k)*1)/k, or roughly O(2), which is O(1).

Your worst case scenario of an insert is indeed O(n). But that's not the most common outcome.

Similar, from how I'm reading this paper, very rarely you'll have an O(ugly) restructuring during inserts or deletes, but the rest of the time you can expect O(1) while maintaining sorted order. I'm going to have to take a try at implementing it to see how it goes.

3 comments

That's true for array, that expected insert time is O(1), and it still holds for N inserts, because you double the size every time, which yields O(2N) time complexity. This is not the case for this data structure, time complexity of inserting N elements is O( N log N ), therefore expected complexity of inserting single element cannot be O(1), even if single insert sometimes can be O(1),

O(1) in the paper referred to restructuring, after O( log n ) search time.

We could be violently agreeing, not sure :)

> you'll have an O(ugly) restructuring during inserts or deletes, but the rest of the time you can expect O(1) while maintaining sorted order.

It helps to understand why ugly restructuring is rare ... it's because the rank distribution is such that 1/2 of the time the height of the inserted node is 0 (a leaf), 1/4 of the time the height is 1, 1/8 of the time it is 2, etc. The ugliness of restructuring increases as the height increases, which means that ugliness becomes exponentially rarer. But it still takes O(lg n) to locate the insertion point -- remember, half the time the node is a leaf. And this is a binary search tree, so of course it is O(lg n). If insertion time were O(1) then building the tree would be O(n) which would mean an O(n) sort. That's simply not possible.

So, I think that's what is meant when they say:

with O(1) expected restructuring time and exponentially infrequent occurrences of expensive restructuring.

Occasionally there's O(n) expensive restructuring, but given the nature of how it's increasing space, when and why, and when it will need to happen again, it's twice (or whatever) the prior amount of time before it needs to happen again. I'm not up on my asymptotic notations[1] besides big-O, but perhaps this is more succinctly (if more confusingly, to a general audience) by one of the additional notations?

1: https://en.wikipedia.org/wiki/Big_O_notation#Related_asympto...

> given the nature of how it's increasing space, when and why, and when it will need to happen again

This is really rather straightforward. The whole point of the algorithm is to randomly distribute the height of the inserted node such that half the nodes are leaves (height 0), 1/4 have height 1, 1/8 have height 2, etc. Restructuring is more expensive the larger the height, but the frequency of restructuring decreases exponentially with respect to the cost of doing it. Net result: O(1) for restructuring, + O(lg n) to find the insertion point (as is necessarily true for any binary search tree, else we would have a way to do an O(n) sort).