|
|
|
|
|
by mcphage
3910 days ago
|
|
Generally you move to a larger hash table long before you get that much key collision. Which is a big cost once, but the amortized cost doesn't change much. But in general with these discussions, you assume that certain operations are constant on your "machine". For instance, addition is usually considered constant even though it isn't if your numbers are larger than your machine's word size. In the case of the hash tables, you assume your hash is constant, even if, as you note, it isn't really. But the fact is, the coefficient for the O(log N) term is so small, that it doesn't really offer any useful insight towards the performance of the algorithm. |
|
No, it's a big cost every time you pass a threshold, which happens an infinite number of times, because big-O assumes numbers can be arbitrarily large.
>But in general with these discussions, you assume that certain operations are constant on your "machine". For instance, addition is usually considered constant even though it isn't if your numbers are larger than your machine's word size.
As long as the max number size is orthogonal to the number of elements n, you can assume a bound on the time for an addition operation, and then it becomes a constant in your run-time expression with respect to n. No sleight-of-hand there.
>In the case of the hash tables, you assume your hash is constant, even if, as you note, it isn't really. But the fact is, the coefficient for the O(log N) term is so small, that it doesn't really offer any useful insight towards the performance of the algorithm.
All true, but the spec for big-O doesn't say "assume away stuff that doesn't matter in practice". It says, "as n gets arbitrarily large". When you make inconsistent, varying assumptions between problems, you're not doing STEM anymore; you're doing memorization, the kind of thing STEM isn't supposed to be like.
If "the" definition of big-O consistently included a definition that "we never have to deal with more than 2^64 elements", that would be fine. Instead, we have to memorize "the right answers" for when you can and can't assume that, and only then does O(1) for hash lookup pop out.
And if you really just care about in-practice, then you can no safely assume that a typical balanced binary tree lookup takes fewer ops than e.g. SHA256.