1 00:00:00,590 --> 00:00:02,240 - [Instructor] The last sorting algorithm 2 00:00:02,240 --> 00:00:04,840 we'll explore is heap sort. 3 00:00:04,840 --> 00:00:07,530 Heap sort makes use of a binary heap. 4 00:00:07,530 --> 00:00:10,970 We use a binary heap to hold our unsorted elements, 5 00:00:10,970 --> 00:00:13,240 and then remove them in sorted order 6 00:00:13,240 --> 00:00:15,163 as we would with a priority queue. 7 00:00:16,800 --> 00:00:20,210 First let's recall from our earlier presentations, 8 00:00:20,210 --> 00:00:23,010 what exactly is a binary heap? 9 00:00:23,010 --> 00:00:26,660 A binary heap is a binary tree with the heap properties. 10 00:00:26,660 --> 00:00:30,900 There are two, first, the tree must be complete. 11 00:00:30,900 --> 00:00:33,540 This means that each level is full 12 00:00:33,540 --> 00:00:36,660 with the possible exception of the last level. 13 00:00:36,660 --> 00:00:38,450 The last level may be incomplete, 14 00:00:38,450 --> 00:00:40,440 but it should be filled from left to right, 15 00:00:40,440 --> 00:00:42,253 we call this the structure property. 16 00:00:44,280 --> 00:00:47,243 The other property is called the heap order property, 17 00:00:48,105 --> 00:00:49,930 and that is that with the exception of the root, 18 00:00:49,930 --> 00:00:53,053 every node must be ordered with respect to its parent. 19 00:00:54,510 --> 00:00:56,006 Well, you'll recall again 20 00:00:56,006 --> 00:00:58,850 that we have two ways that we can order a heap. 21 00:00:58,850 --> 00:01:00,920 We can have a min heap or a max heap. 22 00:01:00,920 --> 00:01:05,110 A min heap is where we have the minimal value at the root, 23 00:01:05,110 --> 00:01:07,830 and then child nodes must have a value 24 00:01:07,830 --> 00:01:10,383 greater than or equal to that of the parent node. 25 00:01:11,910 --> 00:01:14,610 In a max heap, the maximal value is at the root, 26 00:01:14,610 --> 00:01:16,670 and the child nodes must have values 27 00:01:16,670 --> 00:01:20,000 less than or equal to that of the parent node. 28 00:01:20,000 --> 00:01:22,060 For our implementation of heap sort, 29 00:01:22,060 --> 00:01:25,643 and for these demonstrations, we're gonna use max heap. 30 00:01:27,410 --> 00:01:31,450 Let's also recall what happens when we remove the root. 31 00:01:31,450 --> 00:01:35,100 Since we have a max heap, the maximum value is at the root, 32 00:01:35,100 --> 00:01:36,350 and we're gonna sort our vector 33 00:01:36,350 --> 00:01:39,743 by repeatedly removing the element at the root of our heap. 34 00:01:40,830 --> 00:01:43,340 Now, when we delete the root of a binary heap, 35 00:01:43,340 --> 00:01:46,510 we have to restore the heap order property. 36 00:01:46,510 --> 00:01:48,360 And we do that by percolating down 37 00:01:48,360 --> 00:01:51,220 until we find a new position for the orphaned value. 38 00:01:51,220 --> 00:01:52,970 Now, what do I mean by that? 39 00:01:52,970 --> 00:01:56,150 You'll recall from our earlier presentations 40 00:01:56,150 --> 00:01:59,360 that when we removed an element at the root, 41 00:01:59,360 --> 00:02:01,223 we sort of left an empty bubble, 42 00:02:02,100 --> 00:02:05,300 and there was one value in the lowest level 43 00:02:05,300 --> 00:02:08,730 of the binary heap that we needed to find a new home for it, 44 00:02:08,730 --> 00:02:11,310 that's what we call the orphaned value. 45 00:02:11,310 --> 00:02:14,050 And we let that empty bubble percolate down 46 00:02:14,050 --> 00:02:16,890 through the heap until we found a place 47 00:02:16,890 --> 00:02:19,113 where that orphaned value could reside. 48 00:02:20,080 --> 00:02:23,140 Now, what we're gonna do here is we're gonna do this 49 00:02:23,140 --> 00:02:26,303 by performing a series of swaps within our vector. 50 00:02:27,690 --> 00:02:29,190 Now, you may think of heap sort 51 00:02:29,190 --> 00:02:31,563 as an improved selection sort. 52 00:02:32,430 --> 00:02:34,220 Where a selection sort has to scan 53 00:02:34,220 --> 00:02:36,540 the entire unsorted portion of the vector 54 00:02:36,540 --> 00:02:38,020 to find the next element, 55 00:02:38,020 --> 00:02:42,250 heap sort maintains the unsorted data in a binary heap. 56 00:02:42,250 --> 00:02:44,610 And so we can always find the next element 57 00:02:44,610 --> 00:02:47,253 because it's gonna be at the root of the heap. 58 00:02:49,070 --> 00:02:51,040 So when we look at our vector, 59 00:02:51,040 --> 00:02:53,950 we're going to have a portion of our vector 60 00:02:53,950 --> 00:02:57,180 which is a binary heap, and that's our unsorted elements 61 00:02:57,180 --> 00:02:59,337 with the maximum value at the root. 62 00:02:59,337 --> 00:03:02,100 And then we're gonna have a portion of our vector 63 00:03:02,100 --> 00:03:04,040 that has the sorted elements. 64 00:03:04,040 --> 00:03:05,760 Now, of course, when we start, 65 00:03:05,760 --> 00:03:09,050 the entire vector will be a binary heap, 66 00:03:09,050 --> 00:03:10,810 and there will be no sorted elements, 67 00:03:10,810 --> 00:03:12,980 but we'll see the sorted elements 68 00:03:12,980 --> 00:03:16,430 accumulate on the right hand-side of our vector 69 00:03:16,430 --> 00:03:18,203 as we iterate through heap sort. 70 00:03:21,030 --> 00:03:23,720 Here is some high level pseudo code. 71 00:03:23,720 --> 00:03:25,433 Heap sort takes a vector. 72 00:03:26,310 --> 00:03:28,340 We have another function called heapify 73 00:03:28,340 --> 00:03:29,790 which also takes a vector, 74 00:03:29,790 --> 00:03:33,780 and what that does is it takes the vector, the input vector, 75 00:03:33,780 --> 00:03:36,090 and turns it into a binary heap. 76 00:03:36,090 --> 00:03:39,900 It makes the necessary swaps, does whatever we need to do 77 00:03:39,900 --> 00:03:44,510 to turn the input vector into a valid binary heap. 78 00:03:44,510 --> 00:03:45,880 It just does that by swapping. 79 00:03:45,880 --> 00:03:47,590 We'll see more about how that works 80 00:03:47,590 --> 00:03:50,410 when we implement this in C++. 81 00:03:50,410 --> 00:03:52,180 And then we have a while loop. 82 00:03:52,180 --> 00:03:55,780 As long as there are unsorted elements in our vector, 83 00:03:55,780 --> 00:03:57,780 we remove the root from the heap 84 00:03:57,780 --> 00:03:59,970 and append that to our sorted values, 85 00:03:59,970 --> 00:04:01,240 that's gonna be a maximum, 86 00:04:01,240 --> 00:04:03,520 so the largest values will accumulate 87 00:04:03,520 --> 00:04:05,430 on the right-hand side. 88 00:04:05,430 --> 00:04:06,840 And then we have to take steps 89 00:04:06,840 --> 00:04:09,060 to restore the heap order property 90 00:04:09,060 --> 00:04:11,060 on the unsorted elements. 91 00:04:11,060 --> 00:04:13,840 And we'll do that by that percolating down process 92 00:04:13,840 --> 00:04:16,820 that we saw when we introduced binary heaps 93 00:04:16,820 --> 00:04:17,893 a couple weeks back. 94 00:04:20,160 --> 00:04:22,550 So we're gonna need that heapify function 95 00:04:22,550 --> 00:04:25,190 and we're gonna need that percolate down function. 96 00:04:25,190 --> 00:04:27,270 Again, you'll see more of how this works 97 00:04:27,270 --> 00:04:29,613 when we implement heap sort in C++. 98 00:04:30,960 --> 00:04:31,793 But to start, 99 00:04:31,793 --> 00:04:34,660 let's say we have this input vector at the top, 100 00:04:34,660 --> 00:04:36,580 eight, five, zero, four, three, nine, 101 00:04:36,580 --> 00:04:38,033 one, seven, two, six. 102 00:04:39,040 --> 00:04:41,220 And then we use our heapify function 103 00:04:41,220 --> 00:04:42,500 that I referred to earlier 104 00:04:42,500 --> 00:04:46,380 to transform this vector into a valid binary heap. 105 00:04:46,380 --> 00:04:48,930 And this is done by performing the required swaps 106 00:04:48,930 --> 00:04:51,080 to give our vector the heap order property. 107 00:04:52,060 --> 00:04:57,060 Now, if we run heapify, this vector becomes transformed 108 00:04:57,210 --> 00:04:59,450 into this vector, nine, seven eight, 109 00:04:59,450 --> 00:05:01,637 five, six, zero, one, four, two, three. 110 00:05:03,730 --> 00:05:06,560 You may say, well, how do I know that's a binary heap? 111 00:05:06,560 --> 00:05:08,160 Well, let's look at it this way. 112 00:05:09,230 --> 00:05:11,010 Here we have nine at the root, 113 00:05:11,010 --> 00:05:13,470 you'll recall there's a very precise mapping 114 00:05:13,470 --> 00:05:16,740 between the position of elements 115 00:05:16,740 --> 00:05:21,660 in a binary heaps tree representation 116 00:05:21,660 --> 00:05:24,400 with the representation in a vector. 117 00:05:24,400 --> 00:05:28,400 So this nine here is the nine at the root. 118 00:05:28,400 --> 00:05:31,990 Then there's level below that with seven and eight. 119 00:05:31,990 --> 00:05:35,750 Notice that seven and eight are both less than nine. 120 00:05:35,750 --> 00:05:38,240 And then we have another level below that, 121 00:05:38,240 --> 00:05:40,240 five, six, zero, one. 122 00:05:40,240 --> 00:05:42,590 Notice that five and six are less than seven, 123 00:05:42,590 --> 00:05:44,300 zero and one are less than eight, 124 00:05:44,300 --> 00:05:47,670 so, so far the heap order property looks good. 125 00:05:47,670 --> 00:05:49,310 Then we have another level, 126 00:05:49,310 --> 00:05:51,450 notice that this level is not complete 127 00:05:51,450 --> 00:05:54,610 but it is filled from the left to the right. 128 00:05:54,610 --> 00:05:56,200 So we have four, two, three, 129 00:05:56,200 --> 00:05:58,890 four and two are less than five, we're good there, 130 00:05:58,890 --> 00:06:00,500 three is less than six. 131 00:06:00,500 --> 00:06:04,280 So that looks like a perfectly good binary heap. 132 00:06:04,280 --> 00:06:07,293 And so this is where we're going to start our iteration. 133 00:06:08,850 --> 00:06:11,800 So to run our algorithm in place, 134 00:06:11,800 --> 00:06:15,410 rather than just popping things off the top of the heap, 135 00:06:16,310 --> 00:06:18,630 we're going to do a swap. 136 00:06:18,630 --> 00:06:21,580 So here, this is the value three 137 00:06:21,580 --> 00:06:26,580 that would be orphaned if we were to remove nine, okay? 138 00:06:26,640 --> 00:06:29,980 So what we're gonna do is we're gonna swap nine and three, 139 00:06:29,980 --> 00:06:31,000 swap nine and three, 140 00:06:31,000 --> 00:06:33,270 notice nine will go to the end of the vector 141 00:06:33,270 --> 00:06:36,850 which is gonna become our sorted portion of the vector. 142 00:06:36,850 --> 00:06:39,980 Three will swap with nine, that will appear at the root, 143 00:06:39,980 --> 00:06:41,470 and then we're gonna have to restore 144 00:06:41,470 --> 00:06:42,723 the heap order property. 145 00:06:44,070 --> 00:06:46,370 So now we've swapped nine and three, 146 00:06:46,370 --> 00:06:48,750 we've marked nine as sorted. 147 00:06:48,750 --> 00:06:52,100 Notice that nine is no longer considered part of the heap, 148 00:06:52,100 --> 00:06:55,630 but we have that problem, having three in the root position 149 00:06:55,630 --> 00:06:58,520 as we do here violates the heap order property. 150 00:06:58,520 --> 00:07:00,440 So we'll percolate this value down 151 00:07:00,440 --> 00:07:02,850 until we find a suitable place for it. 152 00:07:02,850 --> 00:07:05,610 And we'll do that by performing swaps. 153 00:07:05,610 --> 00:07:08,240 Now, here, it's easy, we just have one step. 154 00:07:08,240 --> 00:07:10,140 We swapped three with eight, 155 00:07:10,140 --> 00:07:13,190 and that's gonna give us a binary heap. 156 00:07:13,190 --> 00:07:15,530 So here we've swapped three and eight. 157 00:07:15,530 --> 00:07:18,700 Now eight is in the root position. 158 00:07:18,700 --> 00:07:23,170 Three is below eight, and below three are zero and one, 159 00:07:23,170 --> 00:07:25,183 heap order property is intact. 160 00:07:26,960 --> 00:07:28,810 And the next element that we're gonna remove 161 00:07:28,810 --> 00:07:30,273 from the heap is eight. 162 00:07:31,200 --> 00:07:32,670 So what are we gonna do with eight? 163 00:07:32,670 --> 00:07:34,683 We're gonna swap it with two. 164 00:07:36,060 --> 00:07:37,960 So now we've swapped eight and two, 165 00:07:37,960 --> 00:07:40,050 and we mark eight as being sorted. 166 00:07:40,050 --> 00:07:41,760 But now two is at the root, 167 00:07:41,760 --> 00:07:43,680 and we need to restore the heap order property. 168 00:07:43,680 --> 00:07:47,310 So again, we percolate down, choosing the larger element, 169 00:07:47,310 --> 00:07:49,150 which one of these do we swap with? 170 00:07:49,150 --> 00:07:52,440 Seven and three, we swap with the larger, okay? 171 00:07:52,440 --> 00:07:54,773 So we're gonna swap seven and two. 172 00:07:56,610 --> 00:07:59,330 Now we've swapped seven and two, do we have a heap yet? 173 00:07:59,330 --> 00:08:00,540 No, why? 174 00:08:00,540 --> 00:08:03,290 Because five and six are greater than two. 175 00:08:03,290 --> 00:08:06,310 The heap order property is not intact. 176 00:08:06,310 --> 00:08:07,360 So we're gonna swap 177 00:08:07,360 --> 00:08:10,143 with the larger of those two, two and six. 178 00:08:11,320 --> 00:08:13,950 So we swap two and six, 179 00:08:13,950 --> 00:08:16,910 and now the heap order property is restored, 180 00:08:16,910 --> 00:08:18,363 and seven is at the root. 181 00:08:20,090 --> 00:08:21,860 Now, in the next iteration, 182 00:08:21,860 --> 00:08:26,573 we're going to swap seven with four, seven and four, okay? 183 00:08:29,580 --> 00:08:31,363 So we'll swap seven and four. 184 00:08:32,860 --> 00:08:36,630 Now seven is sorted and four is at the root position. 185 00:08:36,630 --> 00:08:39,133 Again, we need to restore the heap order property. 186 00:08:40,300 --> 00:08:42,403 So we'll swap four and six, 187 00:08:43,640 --> 00:08:45,180 and we'll need to swap one more time 188 00:08:45,180 --> 00:08:47,163 to restore the heap order property. 189 00:08:48,380 --> 00:08:49,993 We'll swap four and five. 190 00:08:51,330 --> 00:08:53,180 And now we have three elements sorted, 191 00:08:53,180 --> 00:08:54,550 seven, eight and nine, 192 00:08:54,550 --> 00:08:57,053 and we've consumed the lowest level of our heap. 193 00:08:58,010 --> 00:09:00,800 So to remove six and add it to our sorted portion, 194 00:09:00,800 --> 00:09:02,253 we'll swap six and one. 195 00:09:04,440 --> 00:09:05,780 I think you get the idea now, 196 00:09:05,780 --> 00:09:08,430 so let's move through the rest a little more quickly. 197 00:09:09,770 --> 00:09:11,463 We'll swap one and five. 198 00:09:12,930 --> 00:09:14,683 We'll swap one and four, 199 00:09:15,700 --> 00:09:17,900 and now we've restored the heap order property 200 00:09:17,900 --> 00:09:19,740 and five is at the root. 201 00:09:19,740 --> 00:09:23,083 Now we're gonna remove five by swapping it with zero. 202 00:09:25,420 --> 00:09:28,450 Now, five has been swapped with zero, five is sorted. 203 00:09:28,450 --> 00:09:31,250 Now we have five, six, seven, eight, nine sorted. 204 00:09:31,250 --> 00:09:32,500 Zero is at the root, 205 00:09:32,500 --> 00:09:34,853 so we have to restore the heap order property. 206 00:09:35,910 --> 00:09:37,923 So we swap zero and four. 207 00:09:39,210 --> 00:09:41,173 Now we swap zero and two, 208 00:09:43,230 --> 00:09:46,900 and now we have a valid binary heap. 209 00:09:46,900 --> 00:09:49,160 Now we're gonna remove four but from the heap 210 00:09:49,160 --> 00:09:50,663 by swapping it with zero, 211 00:09:52,050 --> 00:09:55,770 and zero percolates down by swapping with three. 212 00:09:55,770 --> 00:09:58,353 We remove three by swapping with one. 213 00:09:59,240 --> 00:10:00,600 Now we'll swap one and two 214 00:10:00,600 --> 00:10:02,513 to restore the heap order property. 215 00:10:03,950 --> 00:10:06,453 We're gonna remove two by swapping with zero. 216 00:10:08,650 --> 00:10:10,240 Now we swap zero and one 217 00:10:10,240 --> 00:10:12,223 to restore the heap order property. 218 00:10:13,730 --> 00:10:18,730 Now we'll remove one, and now we've consumed our heap, 219 00:10:19,380 --> 00:10:21,810 and our vector is sorted. 220 00:10:21,810 --> 00:10:23,590 See, zero, one, two, three four, 221 00:10:23,590 --> 00:10:24,940 five, six, seven, eight, nine. 222 00:10:24,940 --> 00:10:26,410 And we're done. 223 00:10:26,410 --> 00:10:28,380 Let's take a look at an animation 224 00:10:28,380 --> 00:10:30,973 from the University of San Francisco website. 225 00:10:33,330 --> 00:10:37,010 So now we're using the visualization tool 226 00:10:37,010 --> 00:10:39,563 at University of San Francisco. 227 00:10:41,610 --> 00:10:44,810 And you may be wondering what's going on here. 228 00:10:44,810 --> 00:10:47,050 The first thing we need to do as you recall 229 00:10:47,050 --> 00:10:50,940 is turn our vector into a binary heap. 230 00:10:50,940 --> 00:10:52,500 And so what you see right now 231 00:10:52,500 --> 00:10:55,420 is the preliminary work that's being done 232 00:10:55,420 --> 00:10:58,270 in order to convert the original input vector 233 00:10:58,270 --> 00:11:01,490 that we started with into a binary heap. 234 00:11:01,490 --> 00:11:06,050 And then once that's done, it starts the process of sorting. 235 00:11:06,050 --> 00:11:09,500 And so you see now that there are elements 236 00:11:09,500 --> 00:11:13,233 accumulating on the right-hand side that are sorted, 237 00:11:14,430 --> 00:11:16,860 and the process is identical 238 00:11:16,860 --> 00:11:20,393 to what I just showed you earlier much more slowly. 239 00:11:22,160 --> 00:11:24,190 We're removing the root element. 240 00:11:24,190 --> 00:11:27,780 We're adding that to the sorted elements, 241 00:11:27,780 --> 00:11:29,480 then we're percolating down 242 00:11:30,690 --> 00:11:35,160 until we find a suitable home for our new root, 243 00:11:35,160 --> 00:11:37,983 and thereby restore the heap order property. 244 00:11:38,930 --> 00:11:42,780 Now, you may wonder what the time complexity is here, 245 00:11:42,780 --> 00:11:46,020 and because the height of a binary tree 246 00:11:46,020 --> 00:11:48,283 within elements is log n, 247 00:11:52,130 --> 00:11:53,770 that means that the worst case 248 00:11:53,770 --> 00:11:55,550 to restore the heap order property 249 00:11:55,550 --> 00:11:58,870 is we're gonna need to perform log n swaps. 250 00:11:58,870 --> 00:12:00,500 And we have to do that n times, 251 00:12:00,500 --> 00:12:02,640 we have n elements we need to sort, 252 00:12:02,640 --> 00:12:07,200 and in each case we could perform up to log n swaps. 253 00:12:07,200 --> 00:12:10,320 And so that's how we get the time complexity 254 00:12:10,320 --> 00:12:14,163 of o of n log n for this algorithm. 255 00:12:32,230 --> 00:12:33,320 But you see this works 256 00:12:33,320 --> 00:12:36,333 just the way we demonstrated in the slides, 257 00:12:38,620 --> 00:12:40,083 just with a lot more numbers. 258 00:12:42,910 --> 00:12:45,883 And now we're done, and we have this nice sorted vector. 259 00:12:50,930 --> 00:12:54,620 Now, let's go back to our comparison chart. 260 00:12:54,620 --> 00:12:56,530 Here we've added heap sort 261 00:12:56,530 --> 00:13:00,110 among the algorithms that we've examined. 262 00:13:00,110 --> 00:13:05,110 Heap sort has a time complexity of o of n log n, 263 00:13:05,110 --> 00:13:08,260 that's the worst case and average case. 264 00:13:08,260 --> 00:13:11,190 So unlike Quicksort, it doesn't have a worst case 265 00:13:11,190 --> 00:13:12,500 of o of n squared, 266 00:13:12,500 --> 00:13:16,890 it's guaranteed to perform an o of n log n time. 267 00:13:16,890 --> 00:13:21,890 It has o of one space complexity, but it is not stable. 268 00:13:23,140 --> 00:13:25,240 Unlike selection sort, 269 00:13:25,240 --> 00:13:28,090 but the unsorted elements are in a heap. 270 00:13:28,090 --> 00:13:30,180 And so that's where we get the performance boost 271 00:13:30,180 --> 00:13:31,493 over selection sort. 272 00:13:32,880 --> 00:13:34,003 So in summary, 273 00:13:34,870 --> 00:13:37,230 heap sort is an improved selection sort 274 00:13:37,230 --> 00:13:40,680 where we use a heap to manage our unsorted elements. 275 00:13:40,680 --> 00:13:43,550 Heap sort has average case and worst case performance 276 00:13:43,550 --> 00:13:47,293 of o of n log n, and heap sort is not stable. 277 00:13:48,360 --> 00:13:52,283 And that wraps up our survey of sorting algorithms. 278 00:13:53,210 --> 00:13:54,950 In the next presentation, 279 00:13:54,950 --> 00:13:58,600 we will implement heap sort in C++, 280 00:13:58,600 --> 00:14:00,593 and then we'll move on to new topics.