| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by codex 4676 days ago

A simple thought experiment suffices here. What is the shape which holds the most physical bits while minimizing the overall latency for random access? It's a sphere. Each bit occupies a space packed within that sphere. The radius of the sphere is the distance that light must traverse, and thus corresponds to latency.

"slow" elements of the memory hierarchy are on the outside of the sphere, while faster elements (cache, registers, etc) are layered on the inside, like an onion. Since those spheres are smaller they must, by definition, hold fewer bits, but they are, by definition, faster.

The total number of bits you can store is a function of the volume of the sphere. For a given latency level, it's a function of the surface area of the sphere at a given radius.

The volume of a sphere is 4/3pir^3. Because latency is a function of the radius (how far it takes light to bounce to the edge of the sphere and back) that means that latency must rise as at least the cube root of the number of bits you want to store. That is the best possible bound.

This implies that no algorithm is ever O(1) time for an asymptotically large number of elements accessed randomly--not even hash tables or pointer dereferences. They're at best O(n^1/3).

2 comments

ajuc 4676 days ago

> This implies that no algorithm is ever O(1) for an asymptotically large number of elements--not even hash tables or pointer dereferences.

O(1) is about number of operations required by algorithm to finish for given data size, not about the time. So latency doesn't matter.

Also: if the amount of information that can be kept in universe is finite (most probably it is) - then you can make algorithm that takes the same amount of operations no matter data size (just always add dummy data to fill up the data to the physical limit). Thus every algorithm is technically O(1).

Proof: let N be the number of bits that we can keep in memory. Every deterministic algorithm either does infinite loop, or finishes the execution after at most 2^N changes of state (otherways it is 2 times in the same state with different follow-up, and he can't, cause it's deterministic). So if we design an algorithm, that for every data fitting into memory calculates the result and then does busy loop for the remaining steps until the step 2^N - this algorithm is O(1) no matter what it does.

There's probably a hole in my understanding somewhere, cause algorithmic complexity would be a really useless definition if that was true :)

link

gizmo686 4676 days ago

I think the hole in your understanding is assuming that math (in this case big-O) actually maps to reality. Big-O (and algorithms themselves) is defined entirely in mathematical terms. This model can allow input to be arbitrary large, and can allow operation to take a constant time. If you want to, you can talk about the algorithmic complexity of an algorithm assuming prime factorization in constant time. Maybe not useful, but no reason we cannot talk about it.

link

codex 4676 days ago

Usually the implicit assumption with O notation is that n may go to infinity.

Time and the number of operations are equivalent here: as proof, just define the operation as "move an information-carrying photon a tiny distance episilon". That must take a finite amount of time, as the speed of light is finite, and the number of those operations must increase with the number of randomly accessed elements you're working with, as they're necessary simply to retrieve the element from memory.

link

ajuc 4676 days ago

Algorithms have the same complexity no matter the machine: bubble sort is O(n^2) no matter if you use C64 or a new PC. That's why it uses operations instead of time - to be able to compare algorithms independently of machines it runs on.

Operations are usually defined as addition or multiplication or comparison. Moving a photon by epsilon isn't a valid operation in any architecture I'm aware of. Even if we use moving an electron by epsilon - you can't tell pentium to move one electron by exactly epsilon, it will move many at once, and it will move them by whatever it need to perform it's actual operations.

As for infinity - for all physically possible inputs the algorithm modified as described above will produce the same output as the algorithms that are considered correct by most people. If we care about infinities: any algorithm I've seen ever implemented was incorrect - most use integers or floats or doubles so their input space is very limited, and even the ones that use arbitrary length math - are run on machines with finite amount of memory.

link

gizmo686 4676 days ago

Algorithmic complexity is determined by the complexity of the primitive operations. Most computers have primitive operations that are constant time, and can emulate the primitive operations of other computers in constant time. A notable exception to this is quantum computers, which have some operation that can be done faster than classical computers. Another exception is the Turing Machine, which take O(n) time to look up a random value from memory, whereas RAM based machines can do that in O(1) time.

link

rdtsc 4676 days ago

> That's why it uses operations instead of time - to be able to compare algorithms independently of machines it runs on.

Splitting hairs here. You can talk about operations or time. Same thing as operations are sequential. One operation follows another. You can count them when you are done to get to total and the total is also referred to as the "time" in this context.

> I've seen ever implemented was incorrect - most use integers or floats or doubles so their input space is very limited

So are we talking about specific hardware here or not. I thought we weren't. There ambiguity and discussion point is there because one can define what they consider is a "constant" operation. You can say it is a hypothetical von neuman architecture machine and these operations (op1, op2, ....) take a constant time. Now we compare two algorithms and see how they do.

link

chromaton 4676 days ago

Moving data from one place to another (like in a sort) is also an operation.

link

Scaevolus 4676 days ago

This is right for theoretical limits, but modern chips are fabricated as stacked 2D layers, forming planes rather than spheres. This changes information density gain per distance from the core from cubic to quadratic-- in Nehalem, the 64KB of L1 cache has 4 cycle latency, while 256KB of L2 cache (4x more) has 10 cycle latency (~2x slower).

link