Hacker News new | ask | show | jobs
by tomerv 1666 days ago
Practically speaking, it is not a good advice to just treat log(n) as if it is constant. While it's true that log(n) is always small in practice, the conclusion should not be to ignore it, but rather to notice the constant factor. And in practice, usually data structures with O(log n) complexity also have a bigger hidden constant. For example, std::unordered_map is much faster in practice than std::map. Of course, this is not strictly correct, it's just a heuristics. Quicksort with its O(log n) [Edit: O(n log n)] complexity is a counter-example to this.
4 comments

To be clear, that is not the advice I'm giving -- but rather, when your performance looks like p*log n + q, if q is much greater than p/40 -- that is, the constant term dwarfs the logarithmic term -- then it is safe to consider it constant.
> p*log n + q, if q is much greater than p/40 -- that is, the constant term dwarfs the logarithmic term

I think you meant to say "if q is much greater than p TIMES 40".

Ah good catch, yes you are correct.
That seems like a pretty good argument _for_ treating it as constant though and just shifting your focus to how large the constants actually are.
In a virtual memory system, random access to an array of size n takes O(log n) time, and the constant factors in that O(log n) are also nontrivial. Algorithms that do O(log n) computation with O(log n) independent elements tend to take O(log^2 n) time, while those that do O(log n) computation with O(log n) contiguous elements or O(log n) iterations with O(1) elements still take O(log n) time. If the constant factors are small enough, it can be hard to distinguish the latter two from algorithms doing O(1) computation with O(1) elements.
In practice, for any memory system with caches and limited by the speed of light, random (unpredictable) access to an array of size n takes much closer to O(sqrt(n)), not O(log(n)). There's an excellent article discussing this that you can search for, and it holds both in emperical tests on modern hardware and in the theoretical physical limit.
That depends on your perspective. If multiple levels of memory hierarchy are relevant (such as when scaling from 1 MiB to 1 GiB), you will see something resembling O(sqrt(n)). If you remain within the same level (e.g. from 1 GiB to 1 TiB), the scaling resembles O(log n) more closely. Or, in other words, it depends on whether you assume that cache size grows with n or is independent of it.
And of course, nothing is important until you’ve profiled your code and measured that it is.