Hacker News new | ask | show | jobs
by samueloph 1523 days ago
I'm starting to question my understanding of algorithmic complexity, the author says things like "complexity being worst case O(n \log n)" and seems to talk about big O notation used for average cases as well.

I've learned that big O notation is always and exclusively about the worst case, for average and best cases we use Theta Θ and Omega Ω, respectively.

Do people not follow these definitions strictly? Has the author committed a mistake or am I missing something?

I did experience an interviewer once saying that my complexity analysis was wrong as the worst case would be extremely rare to hit, I did try to explain nicely that big O notation is always about the worst case, and that they were thinking about big Theta instead, now I wonder if I was wrong.

15 comments

O, Theta, and Omega are independent from best, worst, and average case. You can mathematically establish upper, lower, or exact bounds on any of the average, best, or worst cases.

It is perfectly valid to say "the best case is O(n)" which means the best case scales no worse than linearly. The "no worse" here is not describing the best case, but rather the strictness of your bound. You could say a sort algorithm is O(n!) since, yes, it does scale no worse than n! but it's not particularly helpful information.

Big O notation is used imprecisely frequently (see also people saying a dataset is O(millions) to describe scale).

Big O, Theta, and Omega notations all ultimately denote sets of functions. So e.g. O(n) is the set of all functions which are eventually overtaken by some linear function.

When we say "X is O(f(n))" what we really mean is "the function that characterizes X is contained within the set of functions denoted by O(f(n))." Anything that can be described by a function hence can be given Big O, Theta, or Omega notation. So yes worst case time complexity could be described in this notation, but so could average time complexity.

But we could also describe things that have nothing to do with time or space complexity with big O notation as well. As long as we've turned it into a function, we can describe it with big O notation.

I think you are mistaken here. The concept of worst/best/amortized case is different from O/Theta/Omega. The latter is about the direction with which it is bounding.

Big O/Theta/Omega denotes sets of functions, and each of them can be applied to best/worst/amortized complexity analysis.

I like to use the metaphor of driving somewhere, and trying to estimate when you're going to get there. Maybe you have a hotel reservation, and you want to make sure you don't get there too early nor too late. Then (with some times filled in for example):

  +---------------------------------------+-------------------------------+--------------------------------+----------------------------------------+
  |                   .                   | Best Case (little/no traffic) | Average Case (average traffic) | Worst Case (car accident, snow-in etc) |
  +---------------------------------------+-------------------------------+--------------------------------+----------------------------------------+
  | Max time, upper bound (Big O)         | 30 minutes                    | 45 minutes                     | 1 hour 20 minutes                      |
  | Both min and max, tight bound (Theta) | 25 minutes                    | 35 minutes                     | 1 hour                                 |
  | Minimum time, lower bound (Omega)     | 15 minutes                    | 25 minutes                     | 50 minutes                             |
  +---------------------------------------+-------------------------------+--------------------------------+----------------------------------------+
So then you can start asking questions like - if I assume I'm in the average case, and I can show up earliest at 20 minutes from now, and show up latest at 50 minutes from now, can I guarantee that I will make it within that window? The answer in this case is yes, because our range is 25-45 minutes, so the earliest I can show up is in 25 minutes and the latest in 45 minutes.

This also helps explain why Big O is generally more useful - we can almost always be early to something, but we can almost never be late to something. So showing up 5 minutes early usually is not a problem.

It also shows why Theta is better than both - in this case, if we had this table we could completely ignore both the Big O row and Omega row, because Theta is a better bound for both min and max time.

Of course, in algorithmic complexity we replace the times with functions in terms of some variables, usually just n.

Your example is nice, but extremely informal, as these are all constant numbers, while O/Theta/Omega (and o, omega, theta) are about functions.

I think the most important thing that needs to be clarified though is that algorithmic complexity is first and foremost a function, `WorseCaseComplexity(size of input) = max number of steps for any input of that size`. This function is often then classed into a "complexity class" using O/Theta/Omega etc, but that is only done for the sake of easier comparison of complexities between algorithms.

>It also shows why Theta is better than both

Big Theta for some functions is an empty set. Take for example bubble sort. It's O(n^2) and Ω(n), but there is no function that asymptotically bounds it above and below. In fact it is better if there isn't one as it means there exists fast cases. For bubble sort it's fast cases are ones where data is almost sorted and requires to scans of the array. If you know this is true of the input data you know bubble sort is more appropriate than something like merge sort which is Ω(nlog(n)). Now the O(n^2) worst case can still be significant so you may want to consider an alternate algorithm that doesn't break down as bad.

As the other reply pointed out we are talking about functions that bound some other function (in this context the amount of steps / operations / time an algorithm will take) when looking at how it behaves asymptotically.

Note that "the amount of steps / operations / time an algorithm will take" is not in general a mathematical function, as for the same input size, different inputs will give different numbers of steps. For bubble sort for example there doesn't exist a function f(n) = number of steps bubble sort will take for an input of size n. There does exist a function best_case_complexity(n) = number of steps bubble sort will take for the best input of size n; and a function worst_case_complexity(n) = number of steps bubble sort will take for the worst input of size n.

Then, we can say best_case_complexity(n) = Theta(n), and worst_case_complexity(n) = Theta(n^2). We can also define the set of all functions representing the number of steps bubble sort will take for input of a certain "shape" of size n. We can then indeed say that any function in this set will be <= worst_case_complexity(n) and >= best_case_complexity(n), so this set will indeed be a subset of Omega(n) and of O(n^2).

But I have the same table, but instead of time, I have the turns need to be taken by the car. That's not always useful.
In school, we used big O notation for all cases. I've never heard about theta or omega notation.

Wikipedia's bubble sort page -- as an example -- also uses it for all cases.

https://en.wikipedia.org/wiki/Bubble_sort

You are mixing two things together. The "average/expected case" and the "worst case" are two different functions, each with their own O, Θ, Ω.
Theta(X) is not "average case" behaviour, it means "both Omega(X) and O(X)".

Ex: merge sort is Theta(n log(n)), but quick sort is not.

The author is likely saying "worst case O(n log(n))" as redundancy to emphasize that this is worst case behaviour.

It's not redundant. You can have "average case O(n log(n))" but "worse case O(n^2)", like quicksort does. Depending on the pivot function, that "worse case" can be more or less common.
It is not redundant to say "average case O(n log(n))". It's not formally a correct usage of O(). But colloquially, it means that the average number of operations over all possible inputs on length n is X.

It is redundant to say "worst case O(n^2)". By definition, O(X) means that the maximum of the number of operations over all possible inputs of length n is at most X.

You are incorrect about this. Worst case and Big O are independent of one another. The O, Ω, Θ, etc... are about describing bounds where O(f) is used for the set of asymptotic upper bounds on f, Ω(f) is the set of asymptotic lower bounds on f, and Θ(f) is the intersection of O(f) and Ω(f).

Typically f is used to describe space or time complexity of an algorithm, but it can be used for any function that maps positive integers to the non-negative real numbers.

You can make any combination of best case/worst case/average case with O, Ω, Θ. In fact, you can even consider other cases as well such as amortized cases, 90th percentile cases, any case you want to investigate that can be defined as a function from positive integers to non-negative real numbers can be classified as having an asymptotic upper bound, lower bound, or tight bound.

Yeah, okay. You are correct here.

"Average-case of algorithm X" can be a function function (average number of operations for all possible inputs of length n). And asymptotic bounds can be applied to arbitrary functions from R -> R.

> It is not redundant to say "average case O(n log(n))". It's not formally a correct usage of O().

You are wrong. It is perfectly formally correct to say "average case complexity is 2n log(100n) + 78n + 90, which is O(n log (n))".

> By definition, O(X) means that the maximum of the number of operations over all possible inputs of length n is at most X.

No, that is, in fact, a colloquial use of O(X). O(X), formally, is simply the set of all functions that are bigger than X up to some constant factor. It has nothing to do with number of operations in itself.

Instead, we formally define complexity functions for an algorithm, which are the minimum/average/maximum number of operations the algorithm will execute for any input of size N; call these functions Cmin(n), Cavg(n), Cmax(n). Then, we can say that C_(n) is in O(X(n)) (or, as a shorthand, C_(n) = O(X(n))) if there exists a number k such that C_(n) < k * X(n) for any n > n0.

So, we can say that the best case complexity of an algorithm, Cmin(n), is O(n^2); this means that even for the best possible input of length n, it will execute at least k * n^2 steps, for some k. Or, we can say that the worse case complexity of some algorithm, Cmax(n), is Omega(n^2); that is, that even in the worse case input of length n, it will execute less than k * n^2 steps, for some k.

In fact, the concept "number of steps that an algorithm will execute for any input of size n" is not a mathematical function at all (unlike min/max/avg number of steps), since there isn't, in general, a unique such number for any given size.

So, I would argue that saying "complexity O(n^2)" is in fact ill-defined formally, and that, if you want to be formal, you must specify "worst case complexity O(n^2)".

I think you're right on the formalism, and I shouldn't have brought that in. I've been out of grad school a while and should clearly brush up on my theory.

My main line of reasoning is:

1. If I say "this algorithm time-complexity is O(X)", I mean that all possible cases are in O(X).

2. If all possible cases are in O(X), then the worst case is in O(X).

3. If the worst case is in O(X), then any other case will also be in O(X). This follows from our definition of "worst case".

4. Therefore, "all possible cases are in O(X)" <=> "worst case is in O(X)"

5. Therefore, "this algorithm time-complexity is O(X)" <=> "this algorithm time-complexity is worst-case O(X)"

If your thought is that 1) isn't a formal definition, fair enough. It may have been a convention we adopted within the department. But aside from that, I think the reasoning is solid.

Would argue it isn't perfectly formally correct, because "average case complexity" isn't well defined. Is average here mean, median or mode? It is typically used as if it meant, 90th percentile, or all but a few degenerate cases.

But that is a bit nit picky.

I'm not sure I've ever seen anyone argue over which measure of central tendency to use to assess the "average case". You certainly could, but it doesn't strike me as a particularly productive argument; in common parlance, an average is essentially always either an arithmetic mean or a non-rigorous informal typical case unless otherwise specified; it also normally doesn't matter very much since most algorithms of interest do have their complexity distributed across inputs s.t. as input size goes to infinity one regime describes the behavior on almost all inputs. It's surprisingly hard to even come up with one without just having having an approximately constant fraction of inputs just be trivial cases.

The much more important problem with "average case" complexity, particularly infamously for sorts, is just that practical "typical" inputs aren't distributed evenly over the space of possible inputs; pathological inputs have, after all, practically by definition, a nontrivial almost–computable property that informally gives them more of a reason to exist than than the average input.

There are variations, that's what we learned back at Uni. Expected O(f(n)), average O(f(n)), amortised O(f(n)), even expected amortised O(f(n)). "Expected" implied a randomised algorithm.
> "Expected" implied a randomised algorithm.

Not necessarily. It can be the "expected" number of steps over the range of unknown inputs even for a deterministic algorithm.

The other replies are great, but just to put it concisely: big O is ‘worst case’ in terms of input size, while the ‘best case big O’ is ‘best case’ in terms of the input contents (‘shape’).
Worst case, median case, and best case complexity have nothing to do with O/Θ/Ω.

When analyzing an algorithm, you want to compute the number of steps it has to execute in certain cases as a function of its input size(s). That function will be something like 2n^2 + 67n + 189 steps. Since this is too much information, you can then express it instead as O(n^2) (or O(n^100)) or Ω(n^2) (or Ω(25n) ) or Θ(n^2).

You can then also compute the best case running time of the algorithm as a function of the size of the input, and get 2n^2 + 67n. Then you can say that the best case running time is also O(n^2).

Finally, you can (sometimes) define an "average" running time.

Now, what do we mean by "shortest running time"? Is that not always for n=0 or something? No - the best case running time has to do with the running time in a case where the input has a certain shape. For some sorting algorithms for example, if the input is already sorted, they will only go through the list once and do nothing, so they have a best case running time that is O(n). Other sorting algorithms will still perform steps even if the input is already sorted, so even their best case running time may be O(n log(n)).

Now, you can say that, given any possible input, the running time of the algorithm is O(worse case running time) and Ω(best case running time). You definitely CAN'T say in general that it is Θ(average case running time) though, as in will sometimes run in time that is shorter than "average case running time" (for best case scenarios); or sometimes it will run in time that is longer than "average case running time" (for worse case scenarios).

Basically, the worse/best/average case running times are an analysis of how properties other than the size of the input affect its running time.

Big-O notation and its friends have nothing to do with complexity analysis. They are a general notation for the asymptotic behavior of mathematical functions. Big-O is usually understood as an upper bound on a function, big-Ω is a lower bound, and Θ is a tight bound (the function f is by definition always within a factor of k above or below Θ(f)).

Edit: let's take an example. Say we have the following algorithm:

  isAllOdd(array x):
    if x is empty:
      return true
    if x[0] is even:
      return false
    return isAllOdd(x[1:])
In the best case, for any size array with the first element being even, this function finishes after executing 4 instructions: 2 comparisons, an array access and then a return. So it's best case complexity is best(n) = 4, which is O(1), Θ(1), Ω(1).

In the worse case, all elements of the array are odd, so we have to go through all n elements to find that out. It will have to execute 6 instructions for each element in the array (2 comparisons, 2 array accesses, a call and a return), except for the last element where it will execute only 2 instructions (check size then return). So its worst case complexity function is worst(n) = 6(n-1) + 2, which is O(n), Θ(n), Ω(n).

In the average case, for every call we are flipping a coin, and we have n total flips. Since we can assume the coin is fair, we will expect to see an even number after log(n/2) calls of isAllOdd. So, our average case complexity is average(n) = 6*log((n-1)/2) + 2 steps, which is O(log n), Θ(log n), Ω(log n). (I am ignoring cases where n < 2, which is anyway irrelevant for asymptotic behavior).

Now, what is the overall complexity of isAllOdd? We can generalize the average case calculation if we want to get an exact value; but whatever that function, overall(n), is, we know for sure best(n) < overall(n) < worst(n); so overall(n) is O(n) and Ω(1). However, we can't define any Θ for it - there is no function which it will differ from by at most a constant factor for any n.

Why do the detailed analysis instead of saying this is O(n) and being done with it? Well, perhaps we know that we are in fact calling this function with a large number of even numbers, so we would actually over-estimate its complexity. If we instead know we are calling it with a large number of odd numbers, we would also find out from the average case analysis that it would be better to define a reverse funtion, isAllEven, and compute isAllOdd = !isAllEven.

I always used big O for worst case (by default). I did think that was common.

Theta and Omega I use differently however.

I sometimes say, best case if X then it's O(n) or whatever.

You should look into amortized complexity. That's the formalized name for worst case runtime over multiple trials.
I remember from college there was big-Oh `O(x)` and little-Oh `o(x)`. Is Omega/Theta the same as little-Oh?
No, big O is a kind of loose upper bound, while little-O is a very strict upper bound.

  f(x) = O(g(x)) <=> there exists k, x0 such that f(x) < k * g(x) for any x > x0

  f(x) = o(g(x)) <=> for any k > 0, there exists x0 s.t. f(x) < k * g(x) for any x > x0
For example:

  f(n) = 3*n^2 + 2n + 1

  f(n) = O(n^2)? 
     let's take k = 10
     3*n^2 + 2n + 1 < 2 * n^2 for any n > n0?
     n = 0: 3 * 0^2 + 2*0 + 1 < 10 * 0^2 <=> 1 < 0 false
     n = 1: 3 * 1^2 + 2*1 + 1 < 10 * 1^2 <=> 6 < 10 true
     n = 2: 3 * 2^2 + 2*2 + 1 < 10 * 2^2 <=> 17 < 40 true
     ...
     so, yes, f(n) = O(n^2) (n0 = 1)

  f(n) = o(n^2) ? 
    let's take k = 2.
    
     3*n^2 + 2n + 1 < 2 * n^2 for any n > n0?

     3 * n^2 + 2 * n + 1 < 2 * n^2      <=>
     3 * n^2 + 2 * n + 1 - 2 * n^2 < 0  <=>
     n^2 + 2 * n + 1 < 0 false for any n > 0.

     so no, f(n) != o(n^2). However, f(n) = o(10 * n^2), and f(n) = o(n^3).

Big-Omega and small-Omega are the same idea for a lower bound.

Big-Theta is a tight bound around the function:

  f(n) = Theta(g(n)) <=> there exists k1, k2, n0 such that k1*g(n) < f(n) < k2*g(n), for any n > n0.
There is no equivalent "little-theta", as that would imply that the relation must hold for any k1 and k2, which would at best leave only the function itself as fulfilling the condition.
Computer scientists stole some notation from mathematicians, but then slightly changed and substantially slackened the definition...
No, not really. CS and other branches of maths are using the exact same definitions of O,o,Θ, and ω. It is true that Ω is used with a different definition in CS, but it's stricter not more slack.

As explained elsewhere though, they are not notations for algorithmic complexity, they are notations for function categorization. It just happens that a common use for this notation in CS is to categorize the function "best/average/worse/... case complexity of an algorithm", but they are also used for other purposes.

In my experience, a lot of the time when programmers say big O, they actually mean big Theta...
Programmers need not be very good computer scientists. The actual scientists publishing papers on algorithms usually get it right.
Yes, I agree with that. However, if f(n)=Theta(n) => f(n)=O(n), so they are usually not wrong, technically, just unnecessarily broad.
In my opinion, nothing about big-O and related omega, theta, etc, is at all strict.

Maybe I value "pure symbolic clairty" too much. I do see a very pragmatic usefullness to the complexity notation.