| That example may have misfired-- perhaps consider that sorting an array of n elements requires addressing n elements, which requires arithmetic on words of log n bits, which are therefore log time operations. Limiting the size of each element won't help your complexity. But we really don't care about that very much. In this case. It's kind of described in that SO post I linked-- you can assume a constant-time hashing operation, but if that offends your conscience, assume an "integer RAM" model where arithmetic operations up to a certain word size w are constant time. Then you can observe that any reasonable w is inconsequential to your problem domain, and go back to treating hashing as a constant-time operation :) The idea is that the fact that w increases at least as the log of n is shared by all computations modeled in integer RAM, so it's not a useful contribution to analysis at that scale. It's in the model, it's just not generally useful to the model. If you ever find yourself asking, is the bit width relevant? You can do some math and get a straight answer. Of course real world algorithms often hash strings which complicates the analysis, but that's orthogonal, like my misstep earlier in implying the size of an element rather than the size of n. The mathematical sense in which hashing time must increase with the log of n is a fact of all algorithms that have ns. I think your surprise at this just stems from wanting a rigorous definition for something that's more of a modeling tool, with different models to choose from. In real life, there's the time on the wall since you clicked "Run", and everything else is theorygramming. |
If you go the route that requires you to handle arbitrarily large tables, then computing a hash with a long enough value necessarily requires more steps, even if you assume all operations on length w integers are constant time -- because eventually your keys have to be larger than w, after which computing the hash increases in number of steps with the key size.
This is still different from the sorting problem. You can have a consistent computation model in which memory seek times are constant (even if unrealizable). That seems fundamentally different from a model where a hash computation is constant steps but also has arbitrarily long output.
The issue isn't whether the scaling of the hash computation matters in practice, but whether the big-O is properly O(1). And even if you did just care about "in practice" limitations, you're stuck with the problem of the hash computation taking longer than any realizable trie lookup would (as another poster mentioned is common), meaning that the O(1) claim misrepresents performance relative to another data structure.