Introduction to Computational Complexity
Contents
Introduction to Computational Complexity#
One of the obvious questions we might ask about an algorithm or bit of code is “how long will it take to run?”. Unfortunately, answering this question precisely is essentially impossible. One reason for this is that the behaviour of modern computers is highly optimized and complex. Instead, we often look for an estimate for how the runtime of the algorithm or code behaves asymptotically as the input size increases  e.g., does it scale linearly, quadratically, logarithmically, etc. We also tend to consider number of steps or basic operations taken rather than time, since these are more independent of any particular computer.
In our context, a “step” or “basic operation” is something like accessing or modifying an element of a list, adding two numbers, comparing two numbers, or assigning a value to a variable. A quick disclaimer: the discussion here is necessarily a bit vague; to define these notions properly you need to talk about things like a model of computation. See MT4512 Automata, Languages, and Complexity for a more precise account.
We will assume that these basic operations take some constant amount of time each. The joy of complexity analysis is that because we are only trying to estimate how the algorithm scales, we get to ignore constants. For example, if an algorithm takes \(xn\) steps on an input of size \(n\), where \(x\) is some constant, then for our purposes this is the same as taking \(n\) steps  all that matters is that it scales linearly. We would say that the algorithm is \(O(n)\), or more precisely \(\Theta(n)\)  more on this notation below.
If we want to measure the number of basic operations that an algorithm takes, we run into the problem of what input we should be giving it. It might be the case that our algorithm can handle some inputs very quickly and other inputs of the same size need much more work. In this course, we will mostly focus on the worstcase, i.e. the maximum number of steps the algorithm requires to handle an input of size \(n\).
Here’s a toy example to demonstrate the idea: imagine we have a length\(n\) list L
of numbers, which we know contains at least one negative numbers, and we are trying to find the first negative number in L
. We might perform the following simple algorithm: loop through the elements x
of L
(in order), and if x
is negative then return x
. Here’s an implementation of that:
# This assumes that all of the entries of L are real numbers,
# and that there is at least one negative number.
def first_negative(L):
for x in L:
if x < 0:
return x
The worst case for this algorithm would be if the first negative number is the last element of the list; then we have to go through the entire list to find it. Until we find a negative number, for each element in the list we perform two steps: accessing that element and comparing its value to \(0\). In the worst case this would take \(2n\) steps, so this algorithm is linear(time).
Complexity analysis of an algorithm is about the inherent and coarse complexity of the algorithm, rather than the fine details of a particular implementation in code. You can also analyse the complexity of an implementation, and these will match if the implementation doesn’t introduce any extra complexity  but in practice it is often worth introducing some extra complexity to make the implementation simpler.
There’s another side to this discussion  what is the inherent complexity of the problem itself? Could we solve the problem above, of finding the first negative number in a list, with an algorithm that is faster than linear? In this case, no: if we do less than a linear amount of work, then we can’t look at every element of the list (or even any fixed proportion of them)  and we could construct a case where all of the entries we look at are positive but we don’t know anything about the others.
Asymptotic Notation#
To make our analysis more convenient, we introduce some new symbols. The precise definitions here are not critical, as it is unusual to work directly with the definitions.
We will take \(n\) to be a natural number variable, and \(f(n)\) and \(g(n)\) to be (eventually) positive functions of \(n\). You can drop the positivity requirement on \(g(n)\) if you take absolute values in the following discussion, but we do need to keep the positivity requirement on \(f(n)\). You should imagine that \(f(n)\) and \(g(n)\) either stay constant or trend upwards as \(n \to \infty\), since for us they represent the amount of steps an algorithm will take on an input of size \(n\).
BigO  “grows no faster than”#
We write \(g(n) \in O(f(n))\) or \(g(n) = O(f(n))\) to mean that \(f(n)\) is asymptotically an upper bound for \(g(n)\), up to some constant multiple; more precisely, that there is some natural number \(n_0\) and constant \(C\) such that \(g(n) \leq Cf(n)\) for all \(n > n_0\). You might prefer to think about this as saying that the ratio \(\frac{g(n)}{f(n)}\) is bounded as \(n \to \infty\) (ignoring any issues of division by 0), or that \(g(n)\) grows no faster than some constant multiple of \(f(n)\).
The use of \(=\) is somewhat misleading; if \(g(n) = O(f_1(n))\) and \(g(n) = O(f_2(n))\) this does not mean that \(O(f_1(n)) = O(f_2(n))\).
Examples#
\(3n^2 + 2n = O(n^3)\), since the ratio \(\frac{3n^2 + 2n}{n^3}\) tends to 0 and not \(\infty\).
It is also, for example, \(O(n^3 + 1)\), but it is unusual to write this  the \(1\) is irrelevant next to the \(n^3\) and \(O(n^3) = O(n^3 + 1)\).
\(3n^2 + 2n = O(n^2)\).
Even though \(3n^2 + 2n > n^2\), the ratio \(\frac{3n^2 + 2n}{n^2}\) tends to \(3\) rather than \(\infty\).
\(O(1)\) is a fancy way of writing “constant”
We say that an algorithm is \(O(f(n))\) if it takes \(g(n)\) steps and \(g(n) = O(f(n))\). The algorithm described in the introduction for finding a negative number in a list, which takes time proportional to \(n\), is \(O(n)\)  and also \(O(n^{200})\).
LittleO  “grows significantly slower than”#
Similarly, \(g(n) = o(f(n))\) if \(g(n)\) is asymptotically dominated by \(f(n)\), i.e. \(\frac{g(n)}{f(n)} \to 0\) as \(n \to \infty\).
Examples#
\(3n^2  2n = o(n^3)\)
\(3n^2  2n \neq o(n^2)\)
between \(n^2\) and \(n^3\), we have \(3n^2  2n = o(n^2\log n)\).
The algorithm in the introduction is not \(o(n)\), since it does take a linear amount of time, but it is \(o(n^2)\) since \(n^2\) asymptotically dominates any linear function.
BigTheta  “grows at roughly the same rate”#
Finally, \(g(n) = \Theta(f(n))\) if \(g(n) = O(f(n))\) and \(f(n) = O(g(n))\), i.e. \(g(n)\) and \(f(n)\) asymptotically grow at the same rate up to some constant multiples.
It is common practice in many books and papers to write \(g(n) = O(f(n))\) while really meaning \(g(n) = \Theta(f(n))\). The difference is that the \(g(n) = O(f(n))\) gives an upperbound for \(g(n)\), while \(g(n) = \Theta(f(n))\) gives an upper and lowerbound.
Examples#
\(3n^2  2n = \Theta(n^2)\)
if \(f\) and \(g\) are polynomials then \(g(n) = \Theta(f(n))\) if and only if they have the same degree
the algorithm in the introduction is \(\Theta(n)\).
Anonymous Functions#
Sometimes we use the notation above to replace functions in expressions. For example, \(\exp n + \frac{3}{2}n^2\log 2n  3n = \exp n + O(n^2\log n)\). You may have seen this used before when writing Taylor expansions, e.g. \(\ln (1  x) = x \frac{x^2}{2} \frac{x^3}{3}  O(x^4)\).
Basic Manipulations with Asymptotic Notation#
Asymptotic notation can make reasoning much easier by stripping away fiddly details. In this section we detail some of the basic things you can do with it. You might prefer to skim this section in favour of the more concrete applications in the next section (Analysing Code).
Ignore Positive Constants#
Positive constants do not matter for BigO, LittleO, and Big\(\Theta\) notation.
Examples#
\(12n^3 = \Theta(n^3)\)
\(n = O(3n)\)
\(3n = O(n)\)
Ignore SlowerGrowing Terms#
Get rid of slowgrowing terms: if \(f(n) = f_1(n) + f_2(n)\) where \(f_2(n) \in o(f_1(n))\), you can throw away \(f_2(n)\).
Be careful! While it is generally okay to throw away one slowgrowing term, if you have many small terms they can combine to be a fastgrowing term. For example, each of the individual terms \(i\) in \(\sum_{i=1}^n i\) is \(O(n)\), but together the sum is \(\Theta(n^2)\). The general rule is that you can safely throw away a constant number of slowgrowing terms.
Here’s a hierarchy of functions that commonly appear, in order of how fast they grow. Don’t worry about this table too much!
Name 
Notation 
Notes 
Example 

Constant 
\(O(1)\) 
\(3 = O(1)\) 

Logarithmic 
\(O(\log n)\) 
Any base 
\(3\log n+12 = O(\log n)\) 
Polynomial 
\(O(n^a)\) 
For any \( 1 \leq a < b\): \(n^a = o(n^b)\) 
\(n^\pi + 3n^2 + 2\log n = O(n^\pi)\) 
Polynomial times logarithm 
\(O(n^a \log n)\) 
For any \(\varepsilon > 0\): \(n^a = o(n^a \log n),\) \(n^a\log n = o(n^{a + \varepsilon})\) 
\(n^3 \log n + 12n^3 = O(n^3 \log n)\) 
Exponential 
\(O(\exp (an))\) 
For any \(0 < a < b\): \(\exp(an) = o(\exp(bn))\) 
\(3\exp (2n) + n^{10000}\exp\left(\frac{3n}{2}\right) = O(\exp 2n)\) 
Factorial 
\(O(n!)\) 
\( n! + \exp (10^{1000} n) = O(n!)\) 
The last two lines showcase one of the weaknesses of asymptotic analysis: by the time \(\exp (2n)\) dominates \(n^{10000}\exp\left(\frac{3n}{2}\right)\), the values are impossibly large for runtimes anyway. Algorithms which only beat other algorithms in impossibly large cases are called galactic algorithms.
Examples#
You can throw away all but the highest power in a polynomial
\(O(n^2 + \log n) = O(n^2)\)
\(O(n^3\log n + n^3) = O(n^3\log n)\)
\(O(a^n + n^{k}) = O(a^n)\) for any \(a > 1\) and \(k > 0\).
Multiplying Asymptotics#
If you are adding together \(f(n)\) many terms which are each \(O(g(n))\), then you can combine them into one \(O(f(n)g(n))\) term. In general this only gives you an upper bound, but it is also true that adding \(f(n)\) many terms which are each \(\Theta(g(n))\) results in a \(\Theta(f(n)g(n))\) term. Be careful not to mix together \(O\) and \(\Theta\) though!
Examples#
In the sum \(\sum_{i=1}^{n} i\), we have \(n\) terms which are each \(O(n)\), giving us \(O(n\times n)\) in total.
We can also say that the sum is \(\Theta(n^2)\) since we know that \(\sum_{i=1}^{n} i = \frac{n(n1)}{2}\).
In the sum \(\sum_{i=1}^{n^3} (i + 3)\), we have \(n^3\) many \(O(n)\) terms, each of which is \(O(n^3)\), so we have an upper bound that the sum is \(O(n^6)\).
For the same sum, we can’t automatically say that it is \(\Theta(n^6)\) since not all of the terms are \(\Theta(n)\) (e.g. the first term is always just \(4\)). However, we can show that is is \(\Theta(n^6)\) by applying standard tricks from analysis (only considering part of the sum, and then replacing the terms in it with smaller terms) to also obtain a lower bound of
Similarly, \(\sum_{i = 1}^{2n} (3i\log i + i) = \Theta(n^2\log n)\).
Analysing Code#
In this section, we apply the ideas above to analysing some actual code. We will first see some simple examples, and then look at some more important examples of sorting algorithms.
Example 1#
This code concatenates two lists into a new list:
def concat(L, M):
out = []
for x in L:
out.append(x)
for x in M:
out.append(x)
return out
concat([1, 2, 3], ["a", "b", "c"])
[1, 2, 3, 'a', 'b', 'c']
In order to determine the complexity of this algorithm, we first have to decide what our “input size” parameter \(n\) is. If we write \(l\) and \(m\) for the sizes of the two lists, then in this case it makes sense to consider \(n = l + m\). Having decided this, what is the complexity?
Creating an empty list is \(O(1)\)  it doesn’t depend on \(n\).
We then perform two loops, each of which contains a single
append
operation.If we consult a reference we see that the “Amortized Worst Case” for
append
on alist
is \(O(1)\) (see below for an explanation of “Amortized”).The first loop runs \(l\) times and the second \(m\) times, so we have an \((l + m)O(1) = O(l + m) = O(n)\) contribution from the loops.
Finally, we will assume that
return
is also \(O(1)\).
Overall, this algorithm is \(O(1) + O(n) + O(1) = O(n)\), i.e. it is linear in the sum of the sizes of the two lists.
Example 2#
This example (inefficiently) finds all of the elements which lie in the intersection of two lists (without repetition).
def inter(L, M):
out = []
# Go through each element of L.
# If it is also in M, and we haven't seen it before, then append it to out.
for x in L:
if x in M and x not in out:
out.append(x)
return out
inter(["a", "c", 1, 2, 3.14, 2], [3.14, 2, 2, "hello"])
[2, 3.14]
Let’s analyse this in the same way as the previous example, and with the same notation.
Creating the empty list
out
is \(O(1)\).The loop will give a contribution of \(O(l \times f(n))\), where \(f(n)\) is the number of steps taken inside the loop.
Inside the loop, we check
x in M and not x in out
. A quick search of the internet for “python list membership complexity” reveals that doing testingx in M
is \(O(m)\) andx not in out
is \(O(\text{out})\). We know that out can’t be longer than \(M\), so overall this is just \(O(m)\).We again assume that
return
is \(O(1)\).
The runtime here is dominated by the loop, giving an \(O(lm)\) complexity. If we want this in terms of one parameter, we could instead write \(O(n^2)\). It is left as an exercise to show that it is in fact worstcase \(\Theta(n^2)\)  you will have to bound \(lm\) in terms of \(n^2\).
Example 3#
This example calculates the sum
def strange_sum(n):
total = 0
for i in range(1, n+1):
for j in range(2, 3*n+1, 2):
term = 1 / (i**2 * j)
total += term
return total
strange_sum(10)
2.571242109163766
Analysing this:
Setting
total = 0
is \(O(1)\)The outer loop repeats \(n\) times.
The inner loop repeats roughly \(\frac{3n}{2} = O(n)\) times.
The work done inside the inner loop is constant, so each inner loop does \(O(n)\) work in total.
So the nested loops do \(O(n^2)\) work in total.
Overall, the function takes \(O(n^2)\) steps.
To be more precise, we could replace each of the \(O\)s above with \(\Theta\).
The very rough ruleofthumb is that \(k\) nested forloops, where the innermost loop does constant work, often means \(\Theta(n^k)\) complexity. However, this is not always the case: we could calculate
using two nested forloops, and the complexity would be \(\Theta(n^4)\).
Example 4#
This example is a more efficient version of Example 2, for finding the intersection of two lists.
def inter(L, M):
# Make sets from L and M
L_set, M_set = set(L), set(M)
# Intersect those sets, and return the answer as a list
return list(L_set & M_set)
inter(["a", "c", 1, 2, 3.14, 2], [3.14, 2, 2, "hello"])
[2, 3.14]
Our analysis now looks like this:
We don’t know what the complexity of doing
set(L)
is, but we can make an educated guess:We could create
set(L)
by first making an empty set \(O(1)\) and then adding each element from \(L\) to the set if it isn’t already in it.Testing whether an element is in a set is worstcase \(O(\text{set})\) (reference).
Here, we know that the size of the set is at most \(l\), so in total this set creation is \(O(l^2)\).
Similarly,
set(M)
is \(O(m^2)\).
Overall, the line
L_set, M_set = set(L), set(M)
is \(O(l^2) + O(m^2) = O(n^2)\).We next need to know the complexity of
L_set & M_set
. The largest that these sets could be is \(l\) and \(m\), and the same reference document as above says that intersection will take in the worst case \(O(lm) = O(n^2)\).Making a
list
from the intersection just requires us to iterate through the intersection and repeatedly append each value to an initially empty list. This will take no more than \(O(n)\) steps.
Overall, this is still \(O(n^2)\). In practice, however, it is massively more efficient than Example 2. This is because the worstcase complexity of set membership and intersection are very unlikely to arise (note the average complexities listed at that reference). Worstcase complexity analysis can be misleading!
Sorting Algorithms#
One of the most important problems in computation is sorting a list (or similar things like a numpy array). It is a critical step in many algorithms, and so a great deal of work has gone into optimising sorting code.
There are a vast number of different ways to sort lists. We will talk about two which fall into the class of comparisonbased sorting. These are algorithms that only rely on \(\leq\) to compare the elements of the list, and don’t need any more information. The key is that you can define \(\leq\) for lots of things, not just numbers:
# The list containing only "a" is <= the list containing "a" and "b" and 2
["a"] <= ["a", "b", 2]
True
This means that comparisonbased sorting is very general. In some cases where the elements of your list have specific properties there are better options (like integer sorting algorithms).
Bubble Sort#
Bubble sort is a very simple comparison sort. It generally performs poorly, but is instructive to analyse. Given a length\(n\) list \(L\) of objects, it goes like so: repeatedly go through the list from the start to the end, swapping adjacent pairs of objects in the list if they are not in the correct order. Stop after you have passed through the entire list without making any swaps (since then everything must be in the right order).
If you happen to prefer to understand algorithms through the medium of Hungarian folk dance, then I have good news (though as the comments mention, it is not an entirely accurate representation).
In pseudocode, this is:
Set a boolean variable
swapped = True
While
swapped == True
, repeat the following: 3. Setswapped = False
. 4. Fori
in \(0, 1, \ldots, n  2\): 5. If notL[i] <= L[i + 1]
then: 6. SwapL[i]
andL[i+1]
; 7. Setswapped = True
.
The variable swapped
simply represents whether we made any swaps on a pass through the list.
In code:
def bubble_sort(L):
swapped = True
while swapped:
swapped = False
for i in range(len(L)  1):
if not L[i] <= L[i+1]:
L[i], L[i+1] = L[i+1], L[i]
swapped = True
return L
bubble_sort([6, 1, 1, 2, 5])
[1, 1, 2, 5, 6]
There is a big issue when analysing the complexity of this; we can show that the complexity of the inner for
loop is \(O(n)\), but what do we do for the while
loop? We need to know how many times the while
loop will run. The key in this case is that after \(k\) passes through the list, the largest \(k\) elements will always have “bubbled” to the end of the list and be in the correct place (exercise in induction!). As a quick example, here is the sequence of lists that bubble sort produces in the first two passes:
Initial list:
[6, 1, 1, 5, 2]
First pass:
[1, 6, 1, 5, 2]  Swapped indices 0 and 1
[1, 1, 6, 5, 2]  Swapped indices 1 and 2
[1, 1, 5, 6, 2]  Swapped indices 2 and 3
[1, 1, 5, 2, 6]  Swapped indices 3 and 4, largest element now in right place
Second pass:
[1, 1, 5, 2, 6]  Swapped indices 0 and 1
[1, 1, 5, 2, 6]  Did not swap indices 1 and 2
[1, 1, 2, 5, 6]  Swapped indices 2 and 3
[1, 1, 2, 5, 6]  Did not swap indices 3 and 4, largest two elements now in right places
...
Once the largest \(n\) elements (i.e. all elements) are in the right place, we are done. This means that the while
loop repeats at most \(n\) times, so the algorithm is \(O(n^2)\). To show that it is \(\Theta(n^2)\), we could exhibit a list of length \(n\) that does take \(\Theta(n^2)\) steps, for each \(n\) (exercise!).
You may have noticed that there is “wasted work” in our bubble sort. For example, if the last \(k\) elements are in the correct order then there is no need to consider them each time through the loop. There are many ways of optimising the performance of bubble sort, but without fundamentally changing the algorithm it will still be \(\Theta(n^2)\).
Insertion Sort#
Bubble sort is mostly a “toy” sorting algorithm; it is inefficient and rarely seriously used. The following algorithm is widelyused, though we will see that it has the same theoretical complexity as bubble sort. Given a length\(n\) list \(L\), the idea is to start a new empty list \(M\), and then repeatedly take an element from \(L\) and “insert” it into the right place in \(M\). At each stage, \(M\) will be in the right order, and after \(n\) insertions it will contain all of the elements from \(L\). In pseudocode:
Set
M = []
.For each
x in L
: 3. Find the indexi
at whichx
should be inserted intoM
to keepM
in order. 4. Insertx
at that index inM
There’s an example of this being done after the following discussion. Once again, those with a preference for Hungarian folk dance are in luck (note that the version of insertion sort they are doing is a little different, but the idea is the same).
We leave the implementation of insertion sort as an exercise, but will discuss the complexity:
Initially creating
M
is \(O(1)\).We loop \(n\) times. Each time, we: 3. Find the index
i
.  This could be done in \(O(M)\) steps by just going through all of the indices to find one that works.  Alternatively, it can be done in \(O(\log M)\) steps by doing a binary search inM
, sinceM
is sorted. This just means first comparingx
to the middle element ofM
to see which half it lies in; then comparing against the middle element of that half and so on. This requires \(O(\log_2 (M))\) steps since we are cutting the search space in half at each stage. 4. Insertingx
intoM
. You might hope that this is \(O(1)\), but it is unfortunately actually an \(O(M)\) operation (since you have to shift everything afterx
down by 1 position). See the reference. 5. So overall, we do \(O(M)\) work each time through the loop.
We can bound \(M\) above by \(n\), so this algorithm is \(O(n^2)\). It is also \(\Theta(n^2)\); the worst case is when \(L\) is in reverseorder, since then you have to shift every element in \(M\) down by one every time you insert a new element, resulting in \(1 + 2 + 3 + \ldots + n  1 = \Theta(n^2)\) shifts.
Even though insertion sort and bubble sort have the same complexity and bubble sort is generally not useful, insertion sort is used. It is a practical algorithm for dealing with small lists, and is often used as a sort of recursive “base case” of more complex algorithms. It is also a very natural algorithm, and similar to what you might do by hand.
There are two morals here:
Worstcase asymptotic analysis doesn’t tell you everything.
Be careful when inserting things into a list; it is often a source of accidental gains in complexity.
Insertion Sort Example#
Initial lists:
L = [5, 1, 3, 4]
M = []
Insertion 1:
x = 5
i = 0  only one possible index to insert x into M
M = [5]
Insertion 2:
x = 1
i = 0  1 should go before 5, so index 0
M = [1, 5]
Insertion 3:
x = 3
i = 1  3 should go after 1 and before 5, so index 1
M = [1, 3, 5]
Insertion 4:
x = 4
i = 2  4 should go after 3 and before 5, so index 2
M = [1, 3, 4, 5]
The Best Possible Sort?#
We’ve seen two sorting algorithms, both of which have complexity \(O(n^2)\). A natural question to ask would be if we can find a sorting algorithm with a lower complexity, and what the lowest possible complexity would be. The answer to the first question is yes: there are many known sorting algorithms with \(\Theta(n\log n)\) complexity, some of which we will discuss in the Pure section of this course. The answer to the other question is also known:
Theorem#
Any comparisonbased sorting algorithm requires at least \(\Theta(n\log n)\) comparisons.
Proof(ish)#
The key is to think about the amount of information that we can gain from doing comparisons. Remember that comparisonbased sorting only assumes that you can compare objects using \(\leq\), and in particular the only way we can get any information about the order the objects should come in is by comparing two of them. This means that if we perform \(k\) comparisons, we can at most distinguish between \(2^k\) orders of a list, since each comparison just gives us one of two values. Suppose in the worstcase, we perform \(M\) comparisons on our list of length \(n\). This means that there are at most \(2^M\) different “paths” our algorithm can take while sorting the list, imagining each test \(x \leq y\) is a “fork in the road” of our algorithm.
Each different ordering of our list must take a different path through our algorithm (why?) and a list of length \(n\) can be ordered in \(n!\) different ways. \(2^M \geq n!\). There’s a nice result called Stirling’s Approximation, which is a bit more powerful than we need but worth knowing anyway, which tells us that asymptotically \(n! \sim \sqrt{2\pi n}\left(\frac{n}{e}\right)^n\). Taking logs (base 2) gives us
That is, we require roughly \(n\log n\) comparisons to distinguish between all the different orderings of lists.
If you are allowed to do more than just compare two objects with \(\leq\), then you can potentially do better.
Amortized Complexity#
You will often see reference to “amortized” complexity when looking up how long basic operations take, like appending an element to a list. The precise definition isn’t too important here, but the basic meaning is this: if you repeatedly do an amortized \(O(f(n))\) operation, then on average over those repeated calls it will take \(O(f(n))\) time at worst.
This is subtly different from the average complexity, where you are averaging over all of your inputs but not assuming that the operation is done more than once.
For example, if you repeatedly append to a single list, then it will almost always take constant time. However, when your list reaches certain sizes more space needs to be allocated for it, and this takes a nonconstant amount of time. Fortunately, this happens so rarely that when your repeated calls to append
are averaged out they only take \(O(1)\) time. In this course, we won’t be picky if you say that something is \(O(f(n))\) and it is really only amortized \(O(f(n))\).