| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Retr0id 368 days ago
	And in practice it's secretly O(logn) or thereabouts if you microbenchmark it, due to caching (if your whole hash table fits in L1, it will be faster than a hash table that doesn't)

1 comments

gpm 368 days ago

Worse, it's O(sqrt(n)) because we lay out data on roughly a plane and time is bounded below by how far away the data is.

O(1) is in the mythical (easy to analyze, good approximation due to small constants on the sqrt(n) and log(n) terms) data model where random memory access is O(1), but it's not physically possible.

link

Dylan16807 367 days ago

While you're factoring in real world constraints, you've gone back to theory rather than practice. In practice the RAM is not far enough away for the travel time to get past a rounding error. Also very high RAM installs use all three dimensions to pack more.

link

gpm 367 days ago

Yeah the justification I gave is pure theory until you're talking large distributed systems... and the theory amuses me here so I'm going to keep talking about it first.

But it's actually a place where practice happens, by coincidence, to match theory pretty well: https://www.ilikebigbits.com/2014_04_21_myth_of_ram_1.html

link

Dylan16807 367 days ago

That's an interesting correlation between the sizes and latencies for cache levels. Thanks for linking it.

Though once you're at the "distributed" tier, I wouldn't expect further increases to be sqrt. Distributed within a datacenter is probably going to have a log(n) topology of links, and distributed across datacenters is a fixed number of milliseconds based on the area you're serving with no real connection to amount of RAM.

link

SkiFire13 368 days ago

This is a bit misguiding however, since the `n` here represents the total amount of RAM on the system, not the one used by the algorithm.

link

gpm 368 days ago

Eh... if the algorithm is dominating your apps runtime the data it loads is probably in the fastest storage device you have that fits. Whether that's L1 cache or a cluster of a dozen computers each with a TB of ram. If not, you can probably optimize by making that the case. At which point `n` is actually the size of the data for the slow algorithm.

I agree you don't often care about this scaling factor though, the O(1) random access ram model gets you pretty far and after that you're probably interested in optimizing for some specific data size anyways.

link

hn_go_brrrrr 368 days ago

I would love to read more about this. Should I be looking at RAM row/column addressing costs?

link

gpm 368 days ago

The link __s posted is really more the level where you need to be aware of the sqrt(n) cost, as you scale from one device to another (L1 cache -> L2 cache -> L3 cache -> ram -> SSD -> SSD in another computer in the same datacenter -> SSD in another computer in the same continent...)

As far as I know (and I may just be ignorant), ignoring the very important part of the equation that are caches, it doesn't really matter what row/column you address in RAM, at that level things are dictated by clock speeds.

Caches are obviously very important though, and beyond optimizing the probability of cache hits, on modern CPUs some cores are "closer" than others and thus cache-interactions between them are faster: https://github.com/nviennot/core-to-core-latency And if you optimize based on this you can get speedups, both at the macro level by doing things like scheduling things that talk on "close" cores, and at the micro level by doing things like implementing NUMA aware locking primitives: https://dl.acm.org/doi/10.1145/3477132.3483557

There's also definitely been CPUs (not sure if this is still a thing) where some cores share memory channels and some cores don't so you can access RAM faster (higher bandwidth) if you spread your access between the two sets of cores instead of staying within one.

link

__s 368 days ago

https://www.ilikebigbits.com/2014_04_21_myth_of_ram_1.html

link