Hacker News new | ask | show | jobs
by tjsnyder 5601 days ago
This is absolutely correct. The worst case in any binary search is exactly floor(lg n + 1) where the element doesn't exist. There is no reason to optimize any further than this.
1 comments

To be clear, interpolation search can cut down the runtime by another log factor (`O(log log n)` instead of `O(log n)`) but it's just not worth the effort, especially in this case, precisely because the lines aren't uniformly distributed.
Well clearly they're more likely to hire someone who does go to the effort!

> To be clear, interpolation search can cut down the runtime by another log factor (`O(log log n)` instead of `O(log n)`) but it's just not worth the effort, especially in this case, precisely because the lines aren't uniformly distributed.

It doesn't matter what the distribution is as long as you can learn it. Uniform helps when you are cutting down the search space in regular chunks, sure. The Reddit data wont be uniform - so don't cut the search space uniformly.

As I said, something faster could be conceived if one studies the average number of visitors over a 24 hour period (I don't know where this data could be gathered from).

If you can make your first seek very accurate, then it may be the case that simple linear search from there will get you there faster than Binary Search.

You can still report average and worst-case complexities under the assumption you have taken about how the log data is distributed.

Sorry to keep going on about this, but I just wanted to make my point clear! Good discussion.