Hacker News new | ask | show | jobs
by stonemetal12 1524 days ago
Would argue it isn't perfectly formally correct, because "average case complexity" isn't well defined. Is average here mean, median or mode? It is typically used as if it meant, 90th percentile, or all but a few degenerate cases.

But that is a bit nit picky.

1 comments

I'm not sure I've ever seen anyone argue over which measure of central tendency to use to assess the "average case". You certainly could, but it doesn't strike me as a particularly productive argument; in common parlance, an average is essentially always either an arithmetic mean or a non-rigorous informal typical case unless otherwise specified; it also normally doesn't matter very much since most algorithms of interest do have their complexity distributed across inputs s.t. as input size goes to infinity one regime describes the behavior on almost all inputs. It's surprisingly hard to even come up with one without just having having an approximately constant fraction of inputs just be trivial cases.

The much more important problem with "average case" complexity, particularly infamously for sorts, is just that practical "typical" inputs aren't distributed evenly over the space of possible inputs; pathological inputs have, after all, practically by definition, a nontrivial almost–computable property that informally gives them more of a reason to exist than than the average input.

The claim was "perfectly formally correct". If the average is arithmetic mean of all possible inputs, then average case O is worst case O because the worst case term dominates. Therefore they are using an informal definition of average, which doesn't jive with formally correct.
> then average case O is worst case O because the worst case term dominates

No, you're overgeneralizing. It's true that the worst case term dominates if e.g. a constant fraction of inputs exhibit the worst-case behavior, so that average complexity is lower bounded by a constant fraction of the worst-case behavior.

This isn't true for the kind of algorithms where it's customary to talk about average case behavior. We only normally talk about average cases when as input size increases, the fraction of inputs with the higher complexity bound shrinks faster. For example, we can imagine an algorithm which, on inputs of size n, requires n^2 operations on 1/n of possible inputs, and n operations on the other (n-1)/n inputs. Then the average runtime is, formally, (n^2)*(1/n)+n*(n-1)/n=2n-1, which is still O(n), even though the worst-case runtime is still O(n^2).

Where this goes wrong is, e.g. perhaps the O(n^2) runtime occurs on "trivial" inputs which are disproportionately likely to appear. For example, if we the above hypothetical function is idempotent and runs in O(n^2) on all of its possible outputs, but O(n) on all other inputs, it'd be easy to accidentally quadratic by calling it on its result repeatedly under the assumption that it's safe to do so.*