Hacker News new | ask | show | jobs
by colanderman 1670 days ago
The best thing I learned in algorithms class is:

For all realizable n, ln(n) is less than 40.

So when amortizing over a large window (say, a block size of 512), it can be the case that, for any realizable n, the logarithmic amortized costs are completely dominated by the constant non-amortized costs (adding an element to a block in this case)... making the operation effectively constant-time.

3 comments

Practically speaking, it is not a good advice to just treat log(n) as if it is constant. While it's true that log(n) is always small in practice, the conclusion should not be to ignore it, but rather to notice the constant factor. And in practice, usually data structures with O(log n) complexity also have a bigger hidden constant. For example, std::unordered_map is much faster in practice than std::map. Of course, this is not strictly correct, it's just a heuristics. Quicksort with its O(log n) [Edit: O(n log n)] complexity is a counter-example to this.
To be clear, that is not the advice I'm giving -- but rather, when your performance looks like p*log n + q, if q is much greater than p/40 -- that is, the constant term dwarfs the logarithmic term -- then it is safe to consider it constant.
> p*log n + q, if q is much greater than p/40 -- that is, the constant term dwarfs the logarithmic term

I think you meant to say "if q is much greater than p TIMES 40".

Ah good catch, yes you are correct.
That seems like a pretty good argument _for_ treating it as constant though and just shifting your focus to how large the constants actually are.
In a virtual memory system, random access to an array of size n takes O(log n) time, and the constant factors in that O(log n) are also nontrivial. Algorithms that do O(log n) computation with O(log n) independent elements tend to take O(log^2 n) time, while those that do O(log n) computation with O(log n) contiguous elements or O(log n) iterations with O(1) elements still take O(log n) time. If the constant factors are small enough, it can be hard to distinguish the latter two from algorithms doing O(1) computation with O(1) elements.
In practice, for any memory system with caches and limited by the speed of light, random (unpredictable) access to an array of size n takes much closer to O(sqrt(n)), not O(log(n)). There's an excellent article discussing this that you can search for, and it holds both in emperical tests on modern hardware and in the theoretical physical limit.
That depends on your perspective. If multiple levels of memory hierarchy are relevant (such as when scaling from 1 MiB to 1 GiB), you will see something resembling O(sqrt(n)). If you remain within the same level (e.g. from 1 GiB to 1 TiB), the scaling resembles O(log n) more closely. Or, in other words, it depends on whether you assume that cache size grows with n or is independent of it.
And of course, nothing is important until you’ve profiled your code and measured that it is.
Random fact of the day: The currently best known upper bound for the complexity of the Union-Find algorithm is the reverse Ackermann function, which can be treated as a constant (4) for all remotely practical (and most of the impractical) values of n.
Tarjan's proof of inverse* Ackermann bound is hard to understand; it's much easier to understand the proof of the log*(n) bound, which is also an extremely slow growing function: https://people.eecs.berkeley.edu/~vazirani/algorithms/chap5....

where log* is the iterated log function; i.e. the number of times you have to apply log repeatedly until you reach a value of 1 or smaller. For reference, log_2*(10^10000) = 5.

https://en.wikipedia.org/wiki/Iterated_logarithm

I wonder if it is true that everything is O(1) if there is an upper bound? Even for an algorithm with O(n^n) complexity, it is still O(1) if n is bounded, just with a extremely large constant.
n has to be bounded much much smaller, and the constant overhead of the algorithm in question much much larger, for any of those approximations to be valid. The approximation works for log n only because it's not uncommon to have constant overheads which dwarf log n for all plausible n (and only works in such cases!) This is very unlikely to be true even for a linear algorithm.