Hacker News new | ask | show | jobs
by ajuc 4629 days ago
> This implies that no algorithm is ever O(1) for an asymptotically large number of elements--not even hash tables or pointer dereferences.

O(1) is about number of operations required by algorithm to finish for given data size, not about the time. So latency doesn't matter.

Also: if the amount of information that can be kept in universe is finite (most probably it is) - then you can make algorithm that takes the same amount of operations no matter data size (just always add dummy data to fill up the data to the physical limit). Thus every algorithm is technically O(1).

Proof: let N be the number of bits that we can keep in memory. Every deterministic algorithm either does infinite loop, or finishes the execution after at most 2^N changes of state (otherways it is 2 times in the same state with different follow-up, and he can't, cause it's deterministic). So if we design an algorithm, that for every data fitting into memory calculates the result and then does busy loop for the remaining steps until the step 2^N - this algorithm is O(1) no matter what it does.

There's probably a hole in my understanding somewhere, cause algorithmic complexity would be a really useless definition if that was true :)

2 comments

I think the hole in your understanding is assuming that math (in this case big-O) actually maps to reality. Big-O (and algorithms themselves) is defined entirely in mathematical terms. This model can allow input to be arbitrary large, and can allow operation to take a constant time. If you want to, you can talk about the algorithmic complexity of an algorithm assuming prime factorization in constant time. Maybe not useful, but no reason we cannot talk about it.
Usually the implicit assumption with O notation is that n may go to infinity.

Time and the number of operations are equivalent here: as proof, just define the operation as "move an information-carrying photon a tiny distance episilon". That must take a finite amount of time, as the speed of light is finite, and the number of those operations must increase with the number of randomly accessed elements you're working with, as they're necessary simply to retrieve the element from memory.

Algorithms have the same complexity no matter the machine: bubble sort is O(n^2) no matter if you use C64 or a new PC. That's why it uses operations instead of time - to be able to compare algorithms independently of machines it runs on.

Operations are usually defined as addition or multiplication or comparison. Moving a photon by epsilon isn't a valid operation in any architecture I'm aware of. Even if we use moving an electron by epsilon - you can't tell pentium to move one electron by exactly epsilon, it will move many at once, and it will move them by whatever it need to perform it's actual operations.

As for infinity - for all physically possible inputs the algorithm modified as described above will produce the same output as the algorithms that are considered correct by most people. If we care about infinities: any algorithm I've seen ever implemented was incorrect - most use integers or floats or doubles so their input space is very limited, and even the ones that use arbitrary length math - are run on machines with finite amount of memory.

Algorithmic complexity is determined by the complexity of the primitive operations. Most computers have primitive operations that are constant time, and can emulate the primitive operations of other computers in constant time. A notable exception to this is quantum computers, which have some operation that can be done faster than classical computers. Another exception is the Turing Machine, which take O(n) time to look up a random value from memory, whereas RAM based machines can do that in O(1) time.
> That's why it uses operations instead of time - to be able to compare algorithms independently of machines it runs on.

Splitting hairs here. You can talk about operations or time. Same thing as operations are sequential. One operation follows another. You can count them when you are done to get to total and the total is also referred to as the "time" in this context.

> I've seen ever implemented was incorrect - most use integers or floats or doubles so their input space is very limited

So are we talking about specific hardware here or not. I thought we weren't. There ambiguity and discussion point is there because one can define what they consider is a "constant" operation. You can say it is a hypothetical von neuman architecture machine and these operations (op1, op2, ....) take a constant time. Now we compare two algorithms and see how they do.

Moving data from one place to another (like in a sort) is also an operation.