| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by CarolineW 2955 days ago

I've only briefly skimmed the paper, but here's something I've come up with, based on what I read, that might or might not be the same as what they've said, but I think "has legs":

* Randomly choose some fixed sized subset of the data and sort it;

* Based on that, deduce the distribution of the data;

* Based on that, create a destination array, and copy things from the input to roughly the expected right place in the output;

* Run around and fix things.

When sorting numbers, that feels like it has a chance of running quite quickly. For other objects with some other comparison function, not sure how it would work.

2 comments

kax 2955 days ago

It works for datasets whose distribution an underlying machine learning model can successfully derive. Otherwise it would fail:

"it works well for big data sets. Because the big data sets usually have fair statistical properties, and its distribution is usually easy to derive and also has some kind of continuity"

link

avip 2954 days ago

What's the asymptotical complexity of running around and fixing things?

link

dspillett 2954 days ago

That stage is presumably a more traditional sort operation. If the first stage does a good job of estimating the correct order this could also be O(n) or as close to as makes no odds. If the set isn't modelled properly by the earlier stages it will still be O(n log n). Obviously that operation needs to be an algorithm that can take advantage of partially pre-sorted input.

link