Lectures 4-5

Divide and Conquer

Algorithms based on the divide and conquer technique reduce larger inputs into a inputs which are b times smaller, solve them using the same algorithm, then combining them into the solution of the full problem. Binary search can be understood as an example of the divide and conquer technique -- after comparing x with the number in the middle, it is sufficient to solve $a=1$ problem which is $b=2$ times smaller! We will see more examples in this lecture.

Example: multiplication of polynomials

In this example we use arrays to denote polynomials. The array $a[0..d]$ is used to denote the polynomial $A(x) = \sum_i a[i] x^i$ (and similarly $b$, $c$, etc.). We want to multiply two such polynomials.

INPUT: Arrays a[0..d-1], b[0..d-1]
OUTPUT: Array c[0..2d-2] such that $C(x) = A(x) B(x)$.


For example, for $a=[1,1,1]$, $b=[1,2,2]$, we have $c=[1,3,5,4,2]$. If you prefer, you can think about multiplication of numbers rather than polynomials -- $A(10)$ is the number, and $a$ are its consecutive digits (starting with the least significant one) -- the example above corresponds to $111\cdot 221=24531$. Numbers in a positional system are multiplied exactly like polynomials, but slightly more because of having to 'carry' when a temporary result is greater than 9.

The trivial algorithm which we have learned in the primary school simply multiplies each value in a with each value in b. This algorithm has time complexity of O(n2).

The more efficient Karatsuba algorithm is based on divide and conquer. Assume that $d=2s$. Split the arrays a and b into two halves of length s each. Thus, we obtain $a_0=a[0..s-1]$, $a1=a[s..d-1]$, $b_0=b[0..s-1]$, $b_1=b[s..d-1]$. If we denote by $A(x)$, $B(x)$, ... the value of the polynomial described by $a$, $b$, ..., we have $A(x) = A_0(x) + x^s A_1(x)$, and $B(x) = B_0(x)+x^s B_1(x)$. Therefore, $C(x) = A(x) \cdot B(x) = A_0(x)B_0(x) + (A_0(x)B_1(x) + A_1(x) B_0(x))x^s + A_1(x) B_1(x)x^d$. Let $C_0(x)=A_0(x)B_0(x)$, $C_1(x)=(A_0+A_1)(x)(B_0+B_1)(x)$, and $C_2(x) = A_1(x)B_1(x)$; thus, we have $C(x) = C_0(x) + (C_1(x)-C_0(x)-C_2(x))x^s + c_2(x)x^d$. This leads to the following algorithm (pseudocode):
to multiply(a[0..d-1], b[0..d-1]):
  if d<1:
    just multiply a[0] by b[0]
  else:
    let s=d/2
    a0=a[0..s-1]
    a1=a[s..d-1]
    b0=b[0..s-1]
    b1=b[s..d-1]
    c0=multiply(a0,b0)
    c1=multiply(a0+a1,b0+b1)
    c2=multiply(a1,b1)
    return c0 + shift(c1-c0-c2, s) + shift(c2, d)
(Note: This pseudocode assumes that $d$ is even -- if not, we can simply add an extra 0)

In this pseudocode, + and - correspond to addition/subtraction of polynomials, which can be done in $O(d)$ -- simply return the array whose $i$-th element is $a[i]±b[i]$ for each $i$. shift(a, i) adds i zeros to the front of array a, which corresponds to multiplcation of $A$ by $x^i$.

What is the time complexity of this algorithm? Note that to multiply $A$ and $B$ of length $d>1$, we need to do three multiplications of numbers of length $s=d/2$, and several additions/subtractions, which are all done in $O(d)$. Therefore, if we denote the running time of this algorithm for arrays of length $d$ by $T(d)$, we get that $T(d) = 3T(d/2) + O(d)$ for $d \geq 2$ (and $T(1) = O(1)$).

How to solve this? Assume that $d=2^n$. By rewriting $T(d/2)$ using the same formula, and repeating until we reach $T(1)$, we obtain that $T(2^n) = O(2^n) + O(2^{n-1}3) + \ldots + O(3^n) = O(3^n)$. Thus, $T(d) = O(3^{\log_2 d}) = O(d^{\log_2 3})$ as long as $d$ is a power of 2. If $d$ is not a power of 2, take $d'$ to be the smallest power of 2 larger than $d$ -- we still have $T(d) < T(d') = O(d'^{\log_2 3}) = O(d^{\log_2 3})$, because $d'< 2d$.

Therefore, the Karatsuba algorithm based on Divide and conquer runs in time $O(d^{\log_2 3}) = O(d^{1.585})$, which is better than the trivial algorithm running in time $O(d^2)$. The memory complexity of both algorithms is $O(d)$ (for Karatsuba algorithm this will be shown below). Thus, for example, if we take $d=1024$, we can expect the trivial algorithm to execute 1048576 multiplications, while the Karatsuba algorithm will execute just 59049 multiplications -- much faster! In practice, the hidden constant in the $O$ notation is quite high, so the trivial algorithm will still run faster for small values of $d$ -- we can solve this by changing the first line of our algorithm "if $d< 1$" to use the trivial algorithm for these small values.

Note that Karatsuba algorithm is not the best known algorithm for this problem -- Fast Fourier Transform (FFT) can be used to multiply polynomials or large numbers in roughly $O(d \log d)$ (this algorithm will not be covered in this course).

Master theorem for divide-and-conquer recurrences

The Master theorem for divide-and-conquer recurrences provides a general way to solve recurrences like this. Suppose that $T(n) = aT(n/b)+f(n)$. Let $c=\log_b a$. Then: In the first case, the recursive part (solving the subproblems) dominates the non-recursive parts (splitting into subproblems and merging the results). In the last case (which rarely occurs in practice), the non-recursive part dominates. In the middle case, both the recursive and non-recursive part contribute into the result complexity. Examples:

Sorting

Now, we will be talking about the problem of sorting:
INPUT: Array a[0..n-1]
OUTPUT: Array a'[0..n-1] such that $a$ and $a'$ have the same elements, but $a'[0] \leq a'[1] \leq ... \leq a'[n-1]$.

Although we need to sort something in our programs very frequently, we rarely need to actually implement a sorting algorithm, as all the popular programming languages have sorting functions in their standard libraries. However, it is still good to know the theory, and there are situations where our program will run faster if we implement our own sorting algorithm.

Furthermore, sorting algorithms are a good showcase of various basic techniques used throughout algorithmics, from ones we have already learned (counting, Divide and Conquer) to new ones (data structures).

Some theory first...

Theorem: Any sorting algorithm based on comparison requires $\Omega(n\log n)$ comparisons to sort an array of length $n$.

Proof (sketch). This could be viewed as a puzzle: we have $n$ coins and we want to order them from lightest to heaviest. We can compare two coins using the scales. How to do this if we want to use the scales the smallest possible number of times? As an example, let's try $n=5$. The coins can be ordered in 5! = 120 possible ways. After one comparison, 60 ways are left. If we perform the second comparison correctly, there are at least 30 possibilities left in the worst case (i.e., the one where the number of possibilities is the greatest); by continuing this process (120 -> 60 -> 30 -> 15 -> 8 -> 4 -> 2 -> 1), we learn that at least 7 comparisons are necessary in the worst case. In general, we need $\log_2(n!)$ comparisons, which is $\Omega(n \log n)$ (because $n! > (n/2)^{n/2}$).

Classic sorting algorithms

The following algorithms have been covered in the lecture. We only present the general idea in these notes -- the details can be found e.g. in the CLRS book; Internet sources such as Wikipedia tend to be rather reliable too.

InsertionSort

InsertionSort works as follows. We add the first, second, third, fourth, etc. element to the sorted array, always inserting it into the correct position.

Unfortunately, as we already know, whenever we insert a new element into an array which has $k$ elements, we need $O(k)$ time to push the later elements to the right. This makes the time complexity of InsertionSort $O(n^2)$ even though we could do only $O(n \log n)$ comparisons when using Binary Search. For this reason, we usually do not use Binary Search when implementing InsertionSort -- it does not make it faster, and actually makes it slower in special cases, such as an array which is already sorted.

InsertionSort has time complexity $O(n^2)$, memory complexity $O(1)$, stable (if two elements are equal, they remain in the same order); fast ($O(n)$) when the input is already sorted.

MergeSort

This sorting algorithm is based on the Divide and Conquer technique: split the array into two halves, sort them, and merge them.

Time complexity is always $O(n \log n)$, memory complexity $O(n)$, stable.

QuickSort

This also can be seen as a sorting algorithm based on the Divide and Conquer technique, but it works in a different way. We choose a pivot, move elements smaller than the pivot to the left, and larger than the pivot to the right. (This is done using the "Polish flag" method shown during the exercise sessions.) Then we sort both parts using the same algorithm.

There are bad cases where it runs in time $\Theta(n^2)$ (when one of the smallest/largest elements is always chosen to be the pivot), but on the average it runs very quickly ($O(n \log n)$). If we take the pivot randomly, the algorithm will run very quickly with very high probability, so in this particular case, average time complexity is practically more useful than the pessimistic one.

Memory complexity is usually $O(\log n)$ but it can be $O(n)$ in the worst case. (This is because, when a function calls itself recursively, the computer needs to remember the previous call (so called recursion stack); therefore, in the worst case above, the computer will have to remember $n$ calls on the recursion stack. It is possible to do it more cleverly and use $O(\log n)$ in the worst case -- use recursion for the smallest part, and then solve the bigger part without using a recursive call.)

HeapSort

HeapSort is interesting, because it is the first algorithm based on a non-trivial data structure.

Data structures are ways of arranging data. In most data structures there is some important property that defines how data is arranged. This property has to be satisfied after every operation (this property called the invariant of the structure -- similarly to loop invariants, which must be satisfied after every iteration of the loop). So far we have seen two basic data structures: unsorted array (no invariant) and sorted array (the invariant here is that the array should be sorted after every iteration).

In an unsorted array, we can add a new element very quickly, but it takes very long to find anything. In a sorted array, we can find every given element very quickly (the invariant helps us); however, as we have seen in InsertionSort, adding a new element takes time (we need to spend extra time to keep the invariant satisfied). This is a common tradeoff in algorithmics (as well as in real life): it takes more time to make the data more ordered, but then it takes less time to search them!

HeapSort is based on the data structure known as complete binary heap. This is a data structure into which we can insert multiple elements (in time $O(\log n)$ each), quickly find the current largest element (in time $O(1)$) and remove the current largest element (in time $O(\log n)$ each). This is perfect for sorting -- we first add all the elements to the heap, then we repeatedly take the greatest element in the heap and remove it. The time complexities stated above give us a $O(n \log n)$ algorihthm.

A complete binary heap is represented as an array $a[1..i]$. (In most programming languages, including Python and C++, array indices start with 0; however, it is a bit easier to describe the complete binary heap starting with index 1. In practice, we can just add a dummy [0] element and not use it.) We arrange the elements into a tree-like structure (see this visualization), where elements $2k$ and $2k+1$ are considered the children of $k$. The heap has to satisfy the invariant called the heap property: the value in the parent is always greater than the children. (Note: in the visualization, only the 'white background' part is in the heap; $i$ is the index of the last element in the white part. Ignore the remaining part for now, the heap property is not satisfied there.) ("Binary" refers to every element having two spots for children, and "complete" refers to the shape where all the possible children spots are filled, until the last row which may be partially filled, but it still has to be completely filled from the left.)

The heap property ensures that the currently largest element is always $a[1]$. To add an element to the heap (with $i$ elements), we put it at the last position (as $a[i+1]$), then we move it upwards as long as it is greater than its parent. Note that we make at most $O(\log n)$ steps here.
upheap(a, i):
    while i>1 and a[i//2] < a[i]:
      (a[i], a[i//2]) = (a[i//2], a[i])
      i = i//2


To remove the greatest element from the heap, we switch its location with the last element ($a[i]$), then remove it. After this, the new element $a[1]$ is probably not placed correctly, so we similarly move it downwards to its correct place:
downheap(a,i,n):
    while true:
      left_greater = 2*i<=n and a[2*i] > a[i]
      right_greater = (2*+1) <= n and a[2*i+1] > a[i]
      if left_greater and right_greater:
        if a[2*i] > a[2*i+1]:
          right_greater = False
        else:
          left_greater = False
      if left_greater:
        (a[i], a[2*i]) = (a[2*i], a[i])
        i = 2*i
      elif right_greater:
        (a[i], a[2*i+1]) = (a[2*i+1], a[i])
        i = 2*i+1
      else:
        return


It is possible to implement HeapSort in memory $O(1)$. To do this, we use the first $i$ elements of the array $a[1..n]$ given in the input as the heap. During the first phase (storing all the elements in the heap), after $i$ steps, the first $i$ elements are in the heap, and the remaining elements have not yet been added. During the second phase (moving the elements from the heap back to the array in the correct order), after $j$ steps, the last $j$ elements are the greatest $j$ elements which have been already found, and the remaining $n-j$ elements are in the heap. (In the visualization, the elements in the heap are white, and elements yet to be inserted to heap / already sorted are gray.) The whole algorithm is implemented as follows:

for i in range(1,n+1):
  upheap(a, i)
for i in range(n, 1, -1):
  (a[1], a[i]) = (a[i], a[1])
  downheap(a, 1, i-1)
Constructing the heap can be implemented more efficiently than we have shown here (in time $O(n)$), but the upheap function will be useful to us again later.

Time complexity of HeapSort is always $O(n \log n)$, memory complexity $O(1)$, but not stable. It is a bit faster than MergeSort in practice, but the lack of stability may be an issue in some situations.

Sorting in real life

What sorting algorithms are used in practice? Popular programming languages today have built-in sorting functions. They are based on the rough ideas given above, but they use powerful optimizations, which make them even better. We will roughly study implementations in C++ and in Python.

C++

C++ has two sorting functions available: sort and stable_sort. Stability is an important property in many applications, so we often want to use a stable algorithm; most popular implementations of C++ use MergeSort for stable_sort. However, it takes extra resources to guarantee stability, so if stability is not necessary, it is better to use an unstable algorithm.

The fastest algorithm in practice is QuickSort, and so the implementation of sort is based on QuickSort (with a smart method of choosing the pivot). However, QuickSort has two disadvantages:

Python

For simplicity, Python has only one sorting function, which sorts in the stable way. As above, the general idea is based on the best stable sorting algorithm covered in these notes (MergeSort), but we use the insights provided by other algorithms in order to improve it.

We have mentioned that InsertionSort works very quickly (in $O(n)$) on data that is already sorted. Similarly, it is easy to construct an algorithm that works very quickly on data that is in the reverse order. All the other algorithms mentioned above (QuickSort, MergeSort, HeapSort) take $O(n \log n)$ time in the best case.

The sorting algorithm implemented in most versions of Python, called TimSort, combines the advantages of MergeSort and InsertionSort. We first look for 'runs' in our data: these are segments which are already sorted, or which are reverse sorted. The reverse sorted segments are reversed, and afterwards, all the runs are merged, using MergeSort.

This yields a sorting algorithm which is not only $O(n \log n)$ in the worst case, but also $O(n)$ in many "easy" cases. Since these "easy" cases actually occur quite often in real world data, TimSort is very fast in practice.

Can we sort even better?

The last section may give an impression that it is pointless to implement sorting algorithms of our own -- after all, our programming languges use algorithms which are more sophisticated than what we could probably implement ourselves, and also more efficient. We have also proven a Theorem saying that we cannot sort data faster than in $O(n \log n)$.

However, this is not the case -- there are situations when a simple sorting algorithm actually works better! This is because our Theorem assumed that our sorting algorithms was based on comparisons. If we know that our data to be sorted has some extra structure, it may be possible to sort it faster than in $O(n \log n)$, and also much faster than the built-in implementations!

CountSort

We have shown the techniques of counting and cumulative sums on previous lectures, but we have not yet tried to used it for sorting. Imagine that you are a teacher, just after grading the tests (let's say there are 200 of them), and you want to sort the test papers alphabetically by the students' last names, so that you can find them easily when the students complain. What is the quickest way to do this?

A method that turns out to be very good in practice is not based on any of the sorting algorithms above. It works as follows: we create 26 piles of tests, each for a different letter of the alphabet (A..Z). Then we look at every test, and we drop it on the pile corresponding to the student's name.

Then we simply collect all the piles in order. Of course this still leaves students not sorted correctly if they share the first letter of their names. Let's assume for now that it is good enough: every student is close to the expected position, and we can easily find them.

CountSort is an algorithm based on the idea above. It is used if there is a small number ($k$) of possible keys -- for example, when we are sorting 10000 people by their birth year (1900..1999 -- a small number of possibilities). Simply count the number of people born in each year, then for every key we compute where people having this key should start (using cumulative sums), and then place each person in their correct location.

For example, assume that we have the following people A (1907), B (1905), C (1907), D (1909). We count that there is 1 person born in 1905, 2 persons born in 1907, 1 person born in 1909. By using cumulative sums, we find out that people born in 1905 should start at position 0, people born in 1907 should start at position 1, and people born in 1909 should start at position 3. We look through our list again and move all the people to their correct positions (A is moved to 1, B is moved to 0, C is moved to 2 because 1 was already taken, D is moved to 3).

Time complexity of CountSort is $O(n+k)$, memory complexity also $O(n+k)$. This sorting algorithm is stable. This algorithm is not based on comparisons, which allows it to circumvent the Theorem mentioned above, and run faster than $O(n \log n)$. As the time complexity suggests, this can work noticeable faster than the standard library algorithms -- but it needs specific kind of data to work. However, it is often the case that the number of possible keys is small.

RadixSort

The CountSort examples given above are not completely satisfying. In the example with the students, we still had to sort every pile alphabetically by the remaining letters. In the birth year example, we have sorted people by their birth year, but what about month and day?

One way of solving this issue is using RadixSort. We will explain RadixSort on the birth year example. RadixSort works as follows: Note that, since we were using a stable sorting algorithm in the last phase, people in every year will remain sorted with respect to month. Likewise, for every month and year, people will be correctly sorted with respect to the day. Therefore all the people are sorted correctly, and we did it quite quickly, in time $O(n * (12 + 31 + 100))$! A similar method could be also used for alphabetical sorting, although it is more difficult there, because the names could be long.

Both CountSort and RadixSort are easy to implement, and they often can be used to sort data much quicker than the built-in sorting functions.
Exercise: we have to sort $N$ numbers in the range from $0$ to $N^2-1$. How to do this in $O(N)$ time?