1 00:00:00,640 --> 00:00:02,640 - [Instructor] Merge Sort. 2 00:00:02,640 --> 00:00:05,000 We start here on this slide with a photograph 3 00:00:05,000 --> 00:00:07,150 of computing pioneer John von Neumann 4 00:00:07,150 --> 00:00:09,410 standing next to part of one of the computers 5 00:00:09,410 --> 00:00:10,513 he helped design. 6 00:00:11,790 --> 00:00:14,100 I show his photo here because he was the inventor 7 00:00:14,100 --> 00:00:17,210 of merge sort back in 1945. 8 00:00:17,210 --> 00:00:20,820 Now merge sort is not like the other sorting algorithms 9 00:00:20,820 --> 00:00:22,000 that you've seen so far. 10 00:00:22,000 --> 00:00:24,393 It's a very different kind of animal. 11 00:00:25,370 --> 00:00:29,310 Merge sort is a divide-and-conquer algorithm, 12 00:00:29,310 --> 00:00:31,120 and divide-and-conquer algorithms 13 00:00:31,120 --> 00:00:33,650 break down a problem recursively, 14 00:00:33,650 --> 00:00:36,220 and then combine the results of subproblems 15 00:00:36,220 --> 00:00:37,963 to produce the final result. 16 00:00:40,100 --> 00:00:42,720 Now, let's think back to the first week of the course 17 00:00:42,720 --> 00:00:44,113 when we covered recursion. 18 00:00:45,030 --> 00:00:48,390 As you recall, recursive functions call themselves, 19 00:00:48,390 --> 00:00:50,240 that's what makes them recursive, 20 00:00:50,240 --> 00:00:54,300 and recursive functions require one or more base cases. 21 00:00:54,300 --> 00:00:55,410 Without a base case, 22 00:00:55,410 --> 00:00:58,190 a recursive function could call itself indefinitely, 23 00:00:58,190 --> 00:00:59,720 and never terminate. 24 00:00:59,720 --> 00:01:01,280 Hence we need the base cases. 25 00:01:01,280 --> 00:01:03,393 The base cases are not recursive. 26 00:01:04,570 --> 00:01:07,230 The examples we've seen earlier in the course 27 00:01:07,230 --> 00:01:09,563 are Fibonacci and vectorial. 28 00:01:10,530 --> 00:01:14,020 Let's refresh our memory with regard to Fibonacci. 29 00:01:14,020 --> 00:01:16,790 The Fibonacci sequence starts with zero and one, 30 00:01:16,790 --> 00:01:20,340 and then calculates subsequent terms by taking the sum 31 00:01:20,340 --> 00:01:23,200 of the previous two terms in the sequence. 32 00:01:23,200 --> 00:01:26,200 So a Fibonacci number, denoted F sub n, 33 00:01:26,200 --> 00:01:29,540 where n is an index, is calculated thus. 34 00:01:29,540 --> 00:01:31,010 F sub n is zero. 35 00:01:31,010 --> 00:01:33,880 If n is zero F sub n is one. 36 00:01:33,880 --> 00:01:38,880 If n is 1 then F sub n is equal to F sub n - 2 37 00:01:39,490 --> 00:01:42,283 plus F sub n - 1 otherwise. 38 00:01:43,990 --> 00:01:46,173 So these are our base cases, 39 00:01:47,550 --> 00:01:49,860 and this is the recursive case. 40 00:01:49,860 --> 00:01:52,360 And we can think of this as breaking the problem down 41 00:01:52,360 --> 00:01:54,410 into smaller subproblems. 42 00:01:54,410 --> 00:01:57,020 The nth Fibonacci number is the sum 43 00:01:57,020 --> 00:01:59,540 of two smaller Fibonacci numbers, 44 00:01:59,540 --> 00:02:02,480 F sub n - 2 and F sub n - 1. 45 00:02:02,480 --> 00:02:05,090 And the solutions to these smaller problems 46 00:02:05,090 --> 00:02:08,260 are combined to produce the final result. 47 00:02:08,260 --> 00:02:12,423 Here, we combined the solutions to sub problems by addition. 48 00:02:14,620 --> 00:02:16,150 But what about merge sort? 49 00:02:16,150 --> 00:02:18,470 Well, merge sort breaks the problem 50 00:02:18,470 --> 00:02:21,190 of sorting down into smaller subproblems. 51 00:02:21,190 --> 00:02:23,350 This is done recursively. 52 00:02:23,350 --> 00:02:24,913 Now we need a base case. 53 00:02:26,100 --> 00:02:28,713 Base case is a vector of a single element. 54 00:02:29,570 --> 00:02:32,580 The single element is sorted by definition. 55 00:02:32,580 --> 00:02:34,580 Now that may seem trivial, 56 00:02:34,580 --> 00:02:35,620 but it is, in fact, 57 00:02:35,620 --> 00:02:37,173 the base case for merge sort. 58 00:02:39,030 --> 00:02:41,080 And how do we produce a solution? 59 00:02:41,080 --> 00:02:42,520 In the case of Fibonacci 60 00:02:42,520 --> 00:02:44,710 we just added the two subsolutions, 61 00:02:44,710 --> 00:02:47,550 but we can't do that when we're sorting. 62 00:02:47,550 --> 00:02:48,673 So what do we do? 63 00:02:49,990 --> 00:02:50,940 We merge. 64 00:02:50,940 --> 00:02:53,193 And this is where merge sort gets it's name. 65 00:02:54,230 --> 00:02:55,550 What is merging? 66 00:02:55,550 --> 00:02:58,260 Well, here's a simple case. 67 00:02:58,260 --> 00:03:00,800 We have 2 one element vectors, 68 00:03:00,800 --> 00:03:05,050 and we merge them to produce 1 two element vectors. 69 00:03:05,050 --> 00:03:08,670 Notice that both of the source vectors we're emerging, 70 00:03:08,670 --> 00:03:10,410 are already sorted. 71 00:03:10,410 --> 00:03:12,010 They're one element vectors, 72 00:03:12,010 --> 00:03:13,560 and they are sorted by default. 73 00:03:14,460 --> 00:03:16,287 Now you're probably thinking, 74 00:03:16,287 --> 00:03:17,980 "Oh, that's just brilliant, 75 00:03:17,980 --> 00:03:19,810 but how does this help me sort a vector 76 00:03:19,810 --> 00:03:21,830 of a million elements?" 77 00:03:21,830 --> 00:03:24,330 All I can say right now is bear with me, 78 00:03:24,330 --> 00:03:26,013 it'll all become clear in time. 79 00:03:28,120 --> 00:03:30,240 So here's a more complicated example. 80 00:03:30,240 --> 00:03:33,450 Here we have to compare the single element on the left 81 00:03:33,450 --> 00:03:35,150 with the single element on the right, 82 00:03:35,150 --> 00:03:37,810 and we use this comparison to merge the elements 83 00:03:37,810 --> 00:03:40,543 from the two source vectors in the correct order. 84 00:03:41,980 --> 00:03:43,003 So far, so good. 85 00:03:44,770 --> 00:03:46,670 Here's another example. 86 00:03:46,670 --> 00:03:49,570 We've already merged 4 one element vectors 87 00:03:49,570 --> 00:03:51,510 into 2 two element vectors, 88 00:03:51,510 --> 00:03:55,590 and now we want to merge them into 1 four element vector. 89 00:03:55,590 --> 00:03:58,000 Now this case is pretty simple. 90 00:03:58,000 --> 00:04:00,350 But what if we wanted to write a function 91 00:04:00,350 --> 00:04:03,460 that can merge any two sorted vectors 92 00:04:03,460 --> 00:04:06,220 into a single, larger sorted vector? 93 00:04:06,220 --> 00:04:07,353 How would that work? 94 00:04:08,410 --> 00:04:09,880 Obviously it would have to be able 95 00:04:09,880 --> 00:04:12,360 to handle a case like this, 96 00:04:12,360 --> 00:04:17,010 but notice that the two input vectors are sorted already. 97 00:04:17,010 --> 00:04:19,453 So, we can use that information. 98 00:04:22,130 --> 00:04:24,300 We know those two input vectors are sorted. 99 00:04:24,300 --> 00:04:27,950 So the smallest value in each will be the left-most position 100 00:04:27,950 --> 00:04:29,283 so we can start there. 101 00:04:30,230 --> 00:04:31,960 So let's begin by figuring out 102 00:04:31,960 --> 00:04:33,290 what should be the first element 103 00:04:33,290 --> 00:04:35,040 in the result vector at the bottom. 104 00:04:35,994 --> 00:04:37,290 Now we just compare the two values 105 00:04:37,290 --> 00:04:40,100 at the indicated positions in the source vectors, 106 00:04:40,100 --> 00:04:42,563 and put that value into our result vector. 107 00:04:43,560 --> 00:04:44,543 Simple, one. 108 00:04:46,040 --> 00:04:46,873 Now what? 109 00:04:46,873 --> 00:04:48,590 Well, we've used an element 110 00:04:48,590 --> 00:04:50,910 from the right-hand source vector, 111 00:04:50,910 --> 00:04:53,450 and we don't want to use that element again. 112 00:04:53,450 --> 00:04:55,483 So let's move that arrow to the right. 113 00:04:57,830 --> 00:05:00,600 And we want to move the arrow on the result vector 114 00:05:00,600 --> 00:05:02,573 to point to the next empty position. 115 00:05:03,410 --> 00:05:06,450 And now, we repeat what we did before. 116 00:05:06,450 --> 00:05:08,280 We compare the elements indicated 117 00:05:08,280 --> 00:05:09,930 by the arrows and the source vectors, 118 00:05:09,930 --> 00:05:11,560 and choose the smaller of the two. 119 00:05:11,560 --> 00:05:13,593 Here the value is two. 120 00:05:14,950 --> 00:05:17,820 We put that into our result vector. 121 00:05:17,820 --> 00:05:22,733 And then, we move the arrows as we did before and repeat. 122 00:05:24,250 --> 00:05:26,780 Again, we choose the smaller of the two values three, 123 00:05:26,780 --> 00:05:29,780 and put it into the result vector in the indicated position. 124 00:05:31,300 --> 00:05:34,120 We've used up all the values in the left-hand source vector. 125 00:05:34,120 --> 00:05:36,130 So we're not going to choose any more values 126 00:05:36,130 --> 00:05:37,083 from that vector. 127 00:05:38,030 --> 00:05:40,700 And, we only have one value to choose 128 00:05:40,700 --> 00:05:42,460 in the right-hand side. 129 00:05:42,460 --> 00:05:45,703 So we're gonna pick that to complete our result vector. 130 00:05:47,960 --> 00:05:49,150 And our merge is complete. 131 00:05:49,150 --> 00:05:50,900 Now that wasn't too difficult. 132 00:05:50,900 --> 00:05:54,360 The important thing is that this process works just as well 133 00:05:54,360 --> 00:05:56,390 with vectors of any size. 134 00:05:56,390 --> 00:05:59,770 If we want to merge 2 million element vectors, 135 00:05:59,770 --> 00:06:01,700 the process would be the same. 136 00:06:01,700 --> 00:06:02,660 Take a little longer, 137 00:06:02,660 --> 00:06:04,360 but the process would be the same. 138 00:06:06,040 --> 00:06:07,870 Now, for efficiency's sake, 139 00:06:07,870 --> 00:06:10,390 we can use a single vector as a source, 140 00:06:10,390 --> 00:06:13,823 so long as it consists of two sorted subvectors. 141 00:06:14,870 --> 00:06:18,163 So, let's expand on our example just a bit. 142 00:06:19,420 --> 00:06:23,120 Here we have our two input vectors, 143 00:06:23,120 --> 00:06:27,620 actually they're sorted subvectors of four elements each, 144 00:06:27,620 --> 00:06:30,110 and we want to merge these into a single vector 145 00:06:30,110 --> 00:06:31,783 of eight sorted elements. 146 00:06:32,740 --> 00:06:35,900 We need to keep track of where our subvectors begin and end 147 00:06:35,900 --> 00:06:38,805 so we know which is the left subvector, 148 00:06:38,805 --> 00:06:40,870 and which is the right subvector. 149 00:06:40,870 --> 00:06:44,410 And so, the input on the left-hand side starts at start, 150 00:06:44,410 --> 00:06:46,210 we're calling that position start, 151 00:06:46,210 --> 00:06:47,453 and runs to middle. 152 00:06:48,300 --> 00:06:50,170 And the input on the right-hand side 153 00:06:50,170 --> 00:06:52,443 runs from middle plus one to the end. 154 00:06:53,890 --> 00:06:54,850 Then, as before, 155 00:06:54,850 --> 00:06:56,520 we have three indices, 156 00:06:56,520 --> 00:06:58,370 one into the left-hand input, 157 00:06:58,370 --> 00:06:59,920 we'll call that i, 158 00:06:59,920 --> 00:07:01,600 one into the right-hand input, 159 00:07:01,600 --> 00:07:03,340 We'll call that j, 160 00:07:03,340 --> 00:07:05,070 and one into the result vector, 161 00:07:05,070 --> 00:07:06,530 We'll call that k. 162 00:07:06,530 --> 00:07:07,913 So i, j, and k. 163 00:07:09,000 --> 00:07:11,580 With this setup, everything proceeds as it did 164 00:07:11,580 --> 00:07:13,510 in our earlier example. 165 00:07:13,510 --> 00:07:15,500 What goes into position k? 166 00:07:15,500 --> 00:07:18,450 The smaller of the values indexed by i and j. 167 00:07:18,450 --> 00:07:19,433 In this case one. 168 00:07:20,760 --> 00:07:22,400 We increment our indices, 169 00:07:22,400 --> 00:07:24,803 adding one to j and one to k. 170 00:07:25,650 --> 00:07:27,120 What's next? 171 00:07:27,120 --> 00:07:29,660 Well, two's smaller than three so we choose two 172 00:07:29,660 --> 00:07:30,933 from the right-hand side. 173 00:07:31,960 --> 00:07:34,180 And we increment our indices. 174 00:07:34,180 --> 00:07:35,540 What's next? 175 00:07:35,540 --> 00:07:38,270 We choose three from the left-hand side, 176 00:07:38,270 --> 00:07:40,573 and we increment the appropriate indices. 177 00:07:42,260 --> 00:07:45,290 Now four from the right-hand side. 178 00:07:45,290 --> 00:07:46,203 Five, 179 00:07:47,230 --> 00:07:48,223 six, 180 00:07:49,390 --> 00:07:51,583 seven from the right hand side, 181 00:07:52,490 --> 00:07:54,453 and eight from the left-hand side. 182 00:07:56,250 --> 00:07:59,460 And now we can copy the values that we've produced 183 00:07:59,460 --> 00:08:01,110 back to the original input vector, 184 00:08:01,110 --> 00:08:05,163 and we have a sorted, merged vector. 185 00:08:07,740 --> 00:08:10,560 So merge sort works by breaking the problem down 186 00:08:10,560 --> 00:08:12,960 into single-element vectors, 187 00:08:12,960 --> 00:08:14,740 or single-element subvectors, 188 00:08:14,740 --> 00:08:16,460 as the case may be. 189 00:08:16,460 --> 00:08:19,720 And then building a solution through successive merges. 190 00:08:19,720 --> 00:08:22,630 And merging really is at the heart of this algorithm. 191 00:08:22,630 --> 00:08:24,800 Breaking the problem down into its components 192 00:08:24,800 --> 00:08:25,853 is almost trivial. 193 00:08:28,330 --> 00:08:30,410 So here's the big picture. 194 00:08:30,410 --> 00:08:35,050 The algorithm splits the input into one element vectors. 195 00:08:35,050 --> 00:08:38,000 A one element vector represents the base case 196 00:08:38,000 --> 00:08:39,060 in the recursions. 197 00:08:39,060 --> 00:08:41,890 And so one element vector is already sorted. 198 00:08:41,890 --> 00:08:44,820 And then using the merge algorithm we just saw, 199 00:08:44,820 --> 00:08:48,700 merge sort builds the solution from the subsolutions. 200 00:08:48,700 --> 00:08:50,243 Divide-and-conquer. 201 00:08:52,170 --> 00:08:53,980 You probably know the Simpsons, 202 00:08:53,980 --> 00:08:55,670 and this was intended as a gag, 203 00:08:55,670 --> 00:08:56,920 but if you look carefully, 204 00:08:56,920 --> 00:09:00,110 Marge sort is exactly the same thing as merge sort. 205 00:09:00,110 --> 00:09:03,480 And this diagram is an accurate depiction of merge sort. 206 00:09:03,480 --> 00:09:04,883 It's a perfect example. 207 00:09:05,950 --> 00:09:10,260 We break the problem up into vectors of length one. 208 00:09:10,260 --> 00:09:14,683 Then we merge them putting items into the appropriate order. 209 00:09:15,610 --> 00:09:17,010 Divide-and-conquer. 210 00:09:17,010 --> 00:09:18,703 Marge sort is merge sort. 211 00:09:21,140 --> 00:09:22,970 Here's some pseudo-code. 212 00:09:22,970 --> 00:09:24,830 First we have the base case 213 00:09:24,830 --> 00:09:28,590 where the vector contains just one element and we return. 214 00:09:28,590 --> 00:09:31,960 Otherwise, we divide the input into two parts, 215 00:09:31,960 --> 00:09:34,010 and make two recursive calls, 216 00:09:34,010 --> 00:09:35,170 one on the left part, 217 00:09:35,170 --> 00:09:36,920 and one on the right part. 218 00:09:36,920 --> 00:09:39,270 And when these recursive calls return, 219 00:09:39,270 --> 00:09:40,773 we merge the results. 220 00:09:42,010 --> 00:09:43,263 That's merge sort. 221 00:09:44,910 --> 00:09:46,820 Now what about complexity? 222 00:09:46,820 --> 00:09:49,550 Well merge sort is very efficient. 223 00:09:49,550 --> 00:09:51,200 Dividing the vector is simple. 224 00:09:51,200 --> 00:09:53,330 We just calculate the midpoint of a vector, 225 00:09:53,330 --> 00:09:55,260 and divide it into two parts. 226 00:09:55,260 --> 00:09:56,853 This takes constant time. 227 00:09:57,750 --> 00:09:58,773 Big O of one. 228 00:09:59,700 --> 00:10:01,340 Merging, as you have seen, 229 00:10:01,340 --> 00:10:03,180 can be done in linear time. 230 00:10:03,180 --> 00:10:04,400 Given n elements, 231 00:10:04,400 --> 00:10:06,440 we can merge them in n steps. 232 00:10:06,440 --> 00:10:07,493 That's big O of n. 233 00:10:08,640 --> 00:10:11,020 But we have to account for the recursion. 234 00:10:11,020 --> 00:10:13,040 Whatever time it takes to perform 235 00:10:13,040 --> 00:10:16,230 division and merging at a given level, 236 00:10:16,230 --> 00:10:18,950 it will take two times the amount of time it takes 237 00:10:18,950 --> 00:10:22,410 to perform the same work on a vector half the size, 238 00:10:22,410 --> 00:10:24,150 because we have two subvectors, 239 00:10:24,150 --> 00:10:26,170 for the recursive call. 240 00:10:26,170 --> 00:10:27,963 So we have this recurrence. 241 00:10:29,310 --> 00:10:31,210 T of n, where T is the time 242 00:10:31,210 --> 00:10:34,870 it takes to process input of n elements, 243 00:10:34,870 --> 00:10:36,760 is equal to two times 244 00:10:36,760 --> 00:10:40,690 the time it takes to process an element of half the size, 245 00:10:40,690 --> 00:10:43,770 plus what it takes to process the current level. 246 00:10:43,770 --> 00:10:45,990 Now if we solve this recurrence, 247 00:10:45,990 --> 00:10:48,363 we get big O of n log n. 248 00:10:49,370 --> 00:10:52,560 You'll recall that a complete binary tree of n elements 249 00:10:52,560 --> 00:10:54,000 has log n levels, 250 00:10:54,000 --> 00:10:55,700 and that's part of the reason why 251 00:10:55,700 --> 00:10:58,950 this recurrence is solved this way. 252 00:10:58,950 --> 00:11:01,420 Because as we break our problem down 253 00:11:01,420 --> 00:11:04,130 into sub trees and then merge them, 254 00:11:04,130 --> 00:11:06,640 we have log n levels to work with. 255 00:11:09,233 --> 00:11:10,460 Is merge sort stable? 256 00:11:10,460 --> 00:11:11,293 Yes. 257 00:11:11,293 --> 00:11:14,540 Merge sort preserves the relative order of equal values 258 00:11:14,540 --> 00:11:15,673 in the input vector. 259 00:11:17,130 --> 00:11:20,390 Let's go back to that comparison table we've seen before, 260 00:11:20,390 --> 00:11:22,153 now with merge sort added. 261 00:11:23,730 --> 00:11:26,800 You'll see that merge sort has time complexity 262 00:11:26,800 --> 00:11:29,253 of O of n log n. 263 00:11:30,270 --> 00:11:33,130 But it also has space complexity of O of n 264 00:11:33,130 --> 00:11:36,120 which is more than the other sort algorithms 265 00:11:36,120 --> 00:11:37,820 that we've seen so far. 266 00:11:37,820 --> 00:11:40,150 That's because we need that temporary vector 267 00:11:40,150 --> 00:11:41,953 to hold our merge results. 268 00:11:42,830 --> 00:11:46,113 It's stable and it's a divide-and-conquer algorithm. 269 00:11:48,050 --> 00:11:51,283 Remember this plot from our introduction to complexity? 270 00:11:52,380 --> 00:11:55,460 O of n squared 271 00:11:55,460 --> 00:11:58,120 is this solid 272 00:11:58,120 --> 00:11:59,993 orange curve here. 273 00:12:00,840 --> 00:12:03,960 And O of n log n 274 00:12:03,960 --> 00:12:07,300 is this dashed green curve. 275 00:12:07,300 --> 00:12:10,850 And from this, you can see that O of n log n 276 00:12:10,850 --> 00:12:14,230 grows much more slowly than O of n squared. 277 00:12:14,230 --> 00:12:16,610 So merge sort is much faster 278 00:12:16,610 --> 00:12:20,073 than any of the other sorting algorithms we've seen so far. 279 00:12:22,440 --> 00:12:26,710 So merge sort is a recursive, divide-and-conquer algorithm. 280 00:12:26,710 --> 00:12:29,050 We'll see other divide-and-conquer algorithms 281 00:12:29,050 --> 00:12:29,933 in this course. 282 00:12:30,930 --> 00:12:34,763 Merge sort has O of n log n time complexity. 283 00:12:35,690 --> 00:12:39,800 Merge sort has O of n space complexity. 284 00:12:39,800 --> 00:12:41,163 And merge sort is stable. 285 00:12:42,200 --> 00:12:47,200 In the next video, we will implement merge sort in C++. 286 00:12:48,500 --> 00:12:49,450 That's all for now.