Hacker News new | ask | show | jobs
by alwahi 321 days ago
okay wait, i thought hash table lookup was already O(1)...
2 comments

O(1) only means that the lookup time doesn't depend on the number of elements in the table. It can be 10 nanoseconds or 10 days.
And in practice it's secretly O(logn) or thereabouts if you microbenchmark it, due to caching (if your whole hash table fits in L1, it will be faster than a hash table that doesn't)
Worse, it's O(sqrt(n)) because we lay out data on roughly a plane and time is bounded below by how far away the data is.

O(1) is in the mythical (easy to analyze, good approximation due to small constants on the sqrt(n) and log(n) terms) data model where random memory access is O(1), but it's not physically possible.

While you're factoring in real world constraints, you've gone back to theory rather than practice. In practice the RAM is not far enough away for the travel time to get past a rounding error. Also very high RAM installs use all three dimensions to pack more.
Yeah the justification I gave is pure theory until you're talking large distributed systems... and the theory amuses me here so I'm going to keep talking about it first.

But it's actually a place where practice happens, by coincidence, to match theory pretty well: https://www.ilikebigbits.com/2014_04_21_myth_of_ram_1.html

That's an interesting correlation between the sizes and latencies for cache levels. Thanks for linking it.

Though once you're at the "distributed" tier, I wouldn't expect further increases to be sqrt. Distributed within a datacenter is probably going to have a log(n) topology of links, and distributed across datacenters is a fixed number of milliseconds based on the area you're serving with no real connection to amount of RAM.

This is a bit misguiding however, since the `n` here represents the total amount of RAM on the system, not the one used by the algorithm.
Eh... if the algorithm is dominating your apps runtime the data it loads is probably in the fastest storage device you have that fits. Whether that's L1 cache or a cluster of a dozen computers each with a TB of ram. If not, you can probably optimize by making that the case. At which point `n` is actually the size of the data for the slow algorithm.

I agree you don't often care about this scaling factor though, the O(1) random access ram model gets you pretty far and after that you're probably interested in optimizing for some specific data size anyways.

I would love to read more about this. Should I be looking at RAM row/column addressing costs?
The link __s posted is really more the level where you need to be aware of the sqrt(n) cost, as you scale from one device to another (L1 cache -> L2 cache -> L3 cache -> ram -> SSD -> SSD in another computer in the same datacenter -> SSD in another computer in the same continent...)

As far as I know (and I may just be ignorant), ignoring the very important part of the equation that are caches, it doesn't really matter what row/column you address in RAM, at that level things are dictated by clock speeds.

Caches are obviously very important though, and beyond optimizing the probability of cache hits, on modern CPUs some cores are "closer" than others and thus cache-interactions between them are faster: https://github.com/nviennot/core-to-core-latency And if you optimize based on this you can get speedups, both at the macro level by doing things like scheduling things that talk on "close" cores, and at the micro level by doing things like implementing NUMA aware locking primitives: https://dl.acm.org/doi/10.1145/3477132.3483557

There's also definitely been CPUs (not sure if this is still a thing) where some cores share memory channels and some cores don't so you can access RAM faster (higher bandwidth) if you spread your access between the two sets of cores instead of staying within one.

Exactly. So many people get this wrong, it should be a question in interviews.
Now it's O(0.5)