Hacker News new | ask | show | jobs
by notafraudster 2036 days ago
You need the cardinality of the data stream to substantially exceed the value of the median; it's also the case that if the mean is very high compared to the range (or variance), you'd do better setting your initial guess to the first item.

I think, as you suggest, it's more fun to say "given my distribution x, what's a good statistic for the median and what are its properties?" than "what's a good general purpose technique for finding a median of any distribution subject to computational constraint y"

1 comments

You also need the range of the dataset to be much larger than 1 (or whatever step is used)