And in practice it's secretly O(logn) or thereabouts if you microbenchmark it, due to caching (if your whole hash table fits in L1, it will be faster than a hash table that doesn't)
Worse, it's O(sqrt(n)) because we lay out data on roughly a plane and time is bounded below by how far away the data is.
O(1) is in the mythical (easy to analyze, good approximation due to small constants on the sqrt(n) and log(n) terms) data model where random memory access is O(1), but it's not physically possible.
While you're factoring in real world constraints, you've gone back to theory rather than practice. In practice the RAM is not far enough away for the travel time to get past a rounding error. Also very high RAM installs use all three dimensions to pack more.
Yeah the justification I gave is pure theory until you're talking large distributed systems... and the theory amuses me here so I'm going to keep talking about it first.
That's an interesting correlation between the sizes and latencies for cache levels. Thanks for linking it.
Though once you're at the "distributed" tier, I wouldn't expect further increases to be sqrt. Distributed within a datacenter is probably going to have a log(n) topology of links, and distributed across datacenters is a fixed number of milliseconds based on the area you're serving with no real connection to amount of RAM.
Eh... if the algorithm is dominating your apps runtime the data it loads is probably in the fastest storage device you have that fits. Whether that's L1 cache or a cluster of a dozen computers each with a TB of ram. If not, you can probably optimize by making that the case. At which point `n` is actually the size of the data for the slow algorithm.
I agree you don't often care about this scaling factor though, the O(1) random access ram model gets you pretty far and after that you're probably interested in optimizing for some specific data size anyways.
The link __s posted is really more the level where you need to be aware of the sqrt(n) cost, as you scale from one device to another (L1 cache -> L2 cache -> L3 cache -> ram -> SSD -> SSD in another computer in the same datacenter -> SSD in another computer in the same continent...)
As far as I know (and I may just be ignorant), ignoring the very important part of the equation that are caches, it doesn't really matter what row/column you address in RAM, at that level things are dictated by clock speeds.
Caches are obviously very important though, and beyond optimizing the probability of cache hits, on modern CPUs some cores are "closer" than others and thus cache-interactions between them are faster: https://github.com/nviennot/core-to-core-latency And if you optimize based on this you can get speedups, both at the macro level by doing things like scheduling things that talk on "close" cores, and at the micro level by doing things like implementing NUMA aware locking primitives: https://dl.acm.org/doi/10.1145/3477132.3483557
There's also definitely been CPUs (not sure if this is still a thing) where some cores share memory channels and some cores don't so you can access RAM faster (higher bandwidth) if you spread your access between the two sets of cores instead of staying within one.
O(1) is in the mythical (easy to analyze, good approximation due to small constants on the sqrt(n) and log(n) terms) data model where random memory access is O(1), but it's not physically possible.