1
00:00:00,590 --> 00:00:02,240
- [Instructor] The last sorting algorithm
2
00:00:02,240 --> 00:00:04,840
we'll explore is heap sort.
3
00:00:04,840 --> 00:00:07,530
Heap sort makes use of a binary heap.
4
00:00:07,530 --> 00:00:10,970
We use a binary heap to
hold our unsorted elements,
5
00:00:10,970 --> 00:00:13,240
and then remove them in sorted order
6
00:00:13,240 --> 00:00:15,163
as we would with a priority queue.
7
00:00:16,800 --> 00:00:20,210
First let's recall from
our earlier presentations,
8
00:00:20,210 --> 00:00:23,010
what exactly is a binary heap?
9
00:00:23,010 --> 00:00:26,660
A binary heap is a binary
tree with the heap properties.
10
00:00:26,660 --> 00:00:30,900
There are two, first, the
tree must be complete.
11
00:00:30,900 --> 00:00:33,540
This means that each level is full
12
00:00:33,540 --> 00:00:36,660
with the possible exception
of the last level.
13
00:00:36,660 --> 00:00:38,450
The last level may be incomplete,
14
00:00:38,450 --> 00:00:40,440
but it should be filled
from left to right,
15
00:00:40,440 --> 00:00:42,253
we call this the structure property.
16
00:00:44,280 --> 00:00:47,243
The other property is called
the heap order property,
17
00:00:48,105 --> 00:00:49,930
and that is that with the
exception of the root,
18
00:00:49,930 --> 00:00:53,053
every node must be ordered
with respect to its parent.
19
00:00:54,510 --> 00:00:56,006
Well, you'll recall again
20
00:00:56,006 --> 00:00:58,850
that we have two ways
that we can order a heap.
21
00:00:58,850 --> 00:01:00,920
We can have a min heap or a max heap.
22
00:01:00,920 --> 00:01:05,110
A min heap is where we have
the minimal value at the root,
23
00:01:05,110 --> 00:01:07,830
and then child nodes must have a value
24
00:01:07,830 --> 00:01:10,383
greater than or equal to
that of the parent node.
25
00:01:11,910 --> 00:01:14,610
In a max heap, the maximal
value is at the root,
26
00:01:14,610 --> 00:01:16,670
and the child nodes must have values
27
00:01:16,670 --> 00:01:20,000
less than or equal to
that of the parent node.
28
00:01:20,000 --> 00:01:22,060
For our implementation of heap sort,
29
00:01:22,060 --> 00:01:25,643
and for these demonstrations,
we're gonna use max heap.
30
00:01:27,410 --> 00:01:31,450
Let's also recall what happens
when we remove the root.
31
00:01:31,450 --> 00:01:35,100
Since we have a max heap, the
maximum value is at the root,
32
00:01:35,100 --> 00:01:36,350
and we're gonna sort our vector
33
00:01:36,350 --> 00:01:39,743
by repeatedly removing the
element at the root of our heap.
34
00:01:40,830 --> 00:01:43,340
Now, when we delete the
root of a binary heap,
35
00:01:43,340 --> 00:01:46,510
we have to restore the
heap order property.
36
00:01:46,510 --> 00:01:48,360
And we do that by percolating down
37
00:01:48,360 --> 00:01:51,220
until we find a new position
for the orphaned value.
38
00:01:51,220 --> 00:01:52,970
Now, what do I mean by that?
39
00:01:52,970 --> 00:01:56,150
You'll recall from our
earlier presentations
40
00:01:56,150 --> 00:01:59,360
that when we removed
an element at the root,
41
00:01:59,360 --> 00:02:01,223
we sort of left an empty bubble,
42
00:02:02,100 --> 00:02:05,300
and there was one value
in the lowest level
43
00:02:05,300 --> 00:02:08,730
of the binary heap that we
needed to find a new home for it,
44
00:02:08,730 --> 00:02:11,310
that's what we call the orphaned value.
45
00:02:11,310 --> 00:02:14,050
And we let that empty
bubble percolate down
46
00:02:14,050 --> 00:02:16,890
through the heap until we found a place
47
00:02:16,890 --> 00:02:19,113
where that orphaned value could reside.
48
00:02:20,080 --> 00:02:23,140
Now, what we're gonna do
here is we're gonna do this
49
00:02:23,140 --> 00:02:26,303
by performing a series of
swaps within our vector.
50
00:02:27,690 --> 00:02:29,190
Now, you may think of heap sort
51
00:02:29,190 --> 00:02:31,563
as an improved selection sort.
52
00:02:32,430 --> 00:02:34,220
Where a selection sort has to scan
53
00:02:34,220 --> 00:02:36,540
the entire unsorted portion of the vector
54
00:02:36,540 --> 00:02:38,020
to find the next element,
55
00:02:38,020 --> 00:02:42,250
heap sort maintains the
unsorted data in a binary heap.
56
00:02:42,250 --> 00:02:44,610
And so we can always find the next element
57
00:02:44,610 --> 00:02:47,253
because it's gonna be
at the root of the heap.
58
00:02:49,070 --> 00:02:51,040
So when we look at our vector,
59
00:02:51,040 --> 00:02:53,950
we're going to have a
portion of our vector
60
00:02:53,950 --> 00:02:57,180
which is a binary heap, and
that's our unsorted elements
61
00:02:57,180 --> 00:02:59,337
with the maximum value at the root.
62
00:02:59,337 --> 00:03:02,100
And then we're gonna have
a portion of our vector
63
00:03:02,100 --> 00:03:04,040
that has the sorted elements.
64
00:03:04,040 --> 00:03:05,760
Now, of course, when we start,
65
00:03:05,760 --> 00:03:09,050
the entire vector will be a binary heap,
66
00:03:09,050 --> 00:03:10,810
and there will be no sorted elements,
67
00:03:10,810 --> 00:03:12,980
but we'll see the sorted elements
68
00:03:12,980 --> 00:03:16,430
accumulate on the right
hand-side of our vector
69
00:03:16,430 --> 00:03:18,203
as we iterate through heap sort.
70
00:03:21,030 --> 00:03:23,720
Here is some high level pseudo code.
71
00:03:23,720 --> 00:03:25,433
Heap sort takes a vector.
72
00:03:26,310 --> 00:03:28,340
We have another function called heapify
73
00:03:28,340 --> 00:03:29,790
which also takes a vector,
74
00:03:29,790 --> 00:03:33,780
and what that does is it takes
the vector, the input vector,
75
00:03:33,780 --> 00:03:36,090
and turns it into a binary heap.
76
00:03:36,090 --> 00:03:39,900
It makes the necessary swaps,
does whatever we need to do
77
00:03:39,900 --> 00:03:44,510
to turn the input vector
into a valid binary heap.
78
00:03:44,510 --> 00:03:45,880
It just does that by swapping.
79
00:03:45,880 --> 00:03:47,590
We'll see more about how that works
80
00:03:47,590 --> 00:03:50,410
when we implement this in C++.
81
00:03:50,410 --> 00:03:52,180
And then we have a while loop.
82
00:03:52,180 --> 00:03:55,780
As long as there are unsorted
elements in our vector,
83
00:03:55,780 --> 00:03:57,780
we remove the root from the heap
84
00:03:57,780 --> 00:03:59,970
and append that to our sorted values,
85
00:03:59,970 --> 00:04:01,240
that's gonna be a maximum,
86
00:04:01,240 --> 00:04:03,520
so the largest values will accumulate
87
00:04:03,520 --> 00:04:05,430
on the right-hand side.
88
00:04:05,430 --> 00:04:06,840
And then we have to take steps
89
00:04:06,840 --> 00:04:09,060
to restore the heap order property
90
00:04:09,060 --> 00:04:11,060
on the unsorted elements.
91
00:04:11,060 --> 00:04:13,840
And we'll do that by that
percolating down process
92
00:04:13,840 --> 00:04:16,820
that we saw when we
introduced binary heaps
93
00:04:16,820 --> 00:04:17,893
a couple weeks back.
94
00:04:20,160 --> 00:04:22,550
So we're gonna need that heapify function
95
00:04:22,550 --> 00:04:25,190
and we're gonna need that
percolate down function.
96
00:04:25,190 --> 00:04:27,270
Again, you'll see more of how this works
97
00:04:27,270 --> 00:04:29,613
when we implement heap sort in C++.
98
00:04:30,960 --> 00:04:31,793
But to start,
99
00:04:31,793 --> 00:04:34,660
let's say we have this
input vector at the top,
100
00:04:34,660 --> 00:04:36,580
eight, five, zero, four, three, nine,
101
00:04:36,580 --> 00:04:38,033
one, seven, two, six.
102
00:04:39,040 --> 00:04:41,220
And then we use our heapify function
103
00:04:41,220 --> 00:04:42,500
that I referred to earlier
104
00:04:42,500 --> 00:04:46,380
to transform this vector
into a valid binary heap.
105
00:04:46,380 --> 00:04:48,930
And this is done by
performing the required swaps
106
00:04:48,930 --> 00:04:51,080
to give our vector the
heap order property.
107
00:04:52,060 --> 00:04:57,060
Now, if we run heapify, this
vector becomes transformed
108
00:04:57,210 --> 00:04:59,450
into this vector, nine, seven eight,
109
00:04:59,450 --> 00:05:01,637
five, six, zero, one, four, two, three.
110
00:05:03,730 --> 00:05:06,560
You may say, well, how do I
know that's a binary heap?
111
00:05:06,560 --> 00:05:08,160
Well, let's look at it this way.
112
00:05:09,230 --> 00:05:11,010
Here we have nine at the root,
113
00:05:11,010 --> 00:05:13,470
you'll recall there's
a very precise mapping
114
00:05:13,470 --> 00:05:16,740
between the position of elements
115
00:05:16,740 --> 00:05:21,660
in a binary heaps tree representation
116
00:05:21,660 --> 00:05:24,400
with the representation in a vector.
117
00:05:24,400 --> 00:05:28,400
So this nine here is the nine at the root.
118
00:05:28,400 --> 00:05:31,990
Then there's level below
that with seven and eight.
119
00:05:31,990 --> 00:05:35,750
Notice that seven and eight
are both less than nine.
120
00:05:35,750 --> 00:05:38,240
And then we have another level below that,
121
00:05:38,240 --> 00:05:40,240
five, six, zero, one.
122
00:05:40,240 --> 00:05:42,590
Notice that five and
six are less than seven,
123
00:05:42,590 --> 00:05:44,300
zero and one are less than eight,
124
00:05:44,300 --> 00:05:47,670
so, so far the heap order
property looks good.
125
00:05:47,670 --> 00:05:49,310
Then we have another level,
126
00:05:49,310 --> 00:05:51,450
notice that this level is not complete
127
00:05:51,450 --> 00:05:54,610
but it is filled from
the left to the right.
128
00:05:54,610 --> 00:05:56,200
So we have four, two, three,
129
00:05:56,200 --> 00:05:58,890
four and two are less than
five, we're good there,
130
00:05:58,890 --> 00:06:00,500
three is less than six.
131
00:06:00,500 --> 00:06:04,280
So that looks like a
perfectly good binary heap.
132
00:06:04,280 --> 00:06:07,293
And so this is where we're
going to start our iteration.
133
00:06:08,850 --> 00:06:11,800
So to run our algorithm in place,
134
00:06:11,800 --> 00:06:15,410
rather than just popping
things off the top of the heap,
135
00:06:16,310 --> 00:06:18,630
we're going to do a swap.
136
00:06:18,630 --> 00:06:21,580
So here, this is the value three
137
00:06:21,580 --> 00:06:26,580
that would be orphaned if we
were to remove nine, okay?
138
00:06:26,640 --> 00:06:29,980
So what we're gonna do is we're
gonna swap nine and three,
139
00:06:29,980 --> 00:06:31,000
swap nine and three,
140
00:06:31,000 --> 00:06:33,270
notice nine will go to
the end of the vector
141
00:06:33,270 --> 00:06:36,850
which is gonna become our
sorted portion of the vector.
142
00:06:36,850 --> 00:06:39,980
Three will swap with nine,
that will appear at the root,
143
00:06:39,980 --> 00:06:41,470
and then we're gonna have to restore
144
00:06:41,470 --> 00:06:42,723
the heap order property.
145
00:06:44,070 --> 00:06:46,370
So now we've swapped nine and three,
146
00:06:46,370 --> 00:06:48,750
we've marked nine as sorted.
147
00:06:48,750 --> 00:06:52,100
Notice that nine is no longer
considered part of the heap,
148
00:06:52,100 --> 00:06:55,630
but we have that problem, having
three in the root position
149
00:06:55,630 --> 00:06:58,520
as we do here violates
the heap order property.
150
00:06:58,520 --> 00:07:00,440
So we'll percolate this value down
151
00:07:00,440 --> 00:07:02,850
until we find a suitable place for it.
152
00:07:02,850 --> 00:07:05,610
And we'll do that by performing swaps.
153
00:07:05,610 --> 00:07:08,240
Now, here, it's easy,
we just have one step.
154
00:07:08,240 --> 00:07:10,140
We swapped three with eight,
155
00:07:10,140 --> 00:07:13,190
and that's gonna give us a binary heap.
156
00:07:13,190 --> 00:07:15,530
So here we've swapped three and eight.
157
00:07:15,530 --> 00:07:18,700
Now eight is in the root position.
158
00:07:18,700 --> 00:07:23,170
Three is below eight, and
below three are zero and one,
159
00:07:23,170 --> 00:07:25,183
heap order property is intact.
160
00:07:26,960 --> 00:07:28,810
And the next element
that we're gonna remove
161
00:07:28,810 --> 00:07:30,273
from the heap is eight.
162
00:07:31,200 --> 00:07:32,670
So what are we gonna do with eight?
163
00:07:32,670 --> 00:07:34,683
We're gonna swap it with two.
164
00:07:36,060 --> 00:07:37,960
So now we've swapped eight and two,
165
00:07:37,960 --> 00:07:40,050
and we mark eight as being sorted.
166
00:07:40,050 --> 00:07:41,760
But now two is at the root,
167
00:07:41,760 --> 00:07:43,680
and we need to restore
the heap order property.
168
00:07:43,680 --> 00:07:47,310
So again, we percolate down,
choosing the larger element,
169
00:07:47,310 --> 00:07:49,150
which one of these do we swap with?
170
00:07:49,150 --> 00:07:52,440
Seven and three, we swap
with the larger, okay?
171
00:07:52,440 --> 00:07:54,773
So we're gonna swap seven and two.
172
00:07:56,610 --> 00:07:59,330
Now we've swapped seven and
two, do we have a heap yet?
173
00:07:59,330 --> 00:08:00,540
No, why?
174
00:08:00,540 --> 00:08:03,290
Because five and six are greater than two.
175
00:08:03,290 --> 00:08:06,310
The heap order property is not intact.
176
00:08:06,310 --> 00:08:07,360
So we're gonna swap
177
00:08:07,360 --> 00:08:10,143
with the larger of those two, two and six.
178
00:08:11,320 --> 00:08:13,950
So we swap two and six,
179
00:08:13,950 --> 00:08:16,910
and now the heap order
property is restored,
180
00:08:16,910 --> 00:08:18,363
and seven is at the root.
181
00:08:20,090 --> 00:08:21,860
Now, in the next iteration,
182
00:08:21,860 --> 00:08:26,573
we're going to swap seven with
four, seven and four, okay?
183
00:08:29,580 --> 00:08:31,363
So we'll swap seven and four.
184
00:08:32,860 --> 00:08:36,630
Now seven is sorted and four
is at the root position.
185
00:08:36,630 --> 00:08:39,133
Again, we need to restore
the heap order property.
186
00:08:40,300 --> 00:08:42,403
So we'll swap four and six,
187
00:08:43,640 --> 00:08:45,180
and we'll need to swap one more time
188
00:08:45,180 --> 00:08:47,163
to restore the heap order property.
189
00:08:48,380 --> 00:08:49,993
We'll swap four and five.
190
00:08:51,330 --> 00:08:53,180
And now we have three elements sorted,
191
00:08:53,180 --> 00:08:54,550
seven, eight and nine,
192
00:08:54,550 --> 00:08:57,053
and we've consumed the
lowest level of our heap.
193
00:08:58,010 --> 00:09:00,800
So to remove six and add
it to our sorted portion,
194
00:09:00,800 --> 00:09:02,253
we'll swap six and one.
195
00:09:04,440 --> 00:09:05,780
I think you get the idea now,
196
00:09:05,780 --> 00:09:08,430
so let's move through the
rest a little more quickly.
197
00:09:09,770 --> 00:09:11,463
We'll swap one and five.
198
00:09:12,930 --> 00:09:14,683
We'll swap one and four,
199
00:09:15,700 --> 00:09:17,900
and now we've restored
the heap order property
200
00:09:17,900 --> 00:09:19,740
and five is at the root.
201
00:09:19,740 --> 00:09:23,083
Now we're gonna remove five
by swapping it with zero.
202
00:09:25,420 --> 00:09:28,450
Now, five has been swapped
with zero, five is sorted.
203
00:09:28,450 --> 00:09:31,250
Now we have five, six,
seven, eight, nine sorted.
204
00:09:31,250 --> 00:09:32,500
Zero is at the root,
205
00:09:32,500 --> 00:09:34,853
so we have to restore
the heap order property.
206
00:09:35,910 --> 00:09:37,923
So we swap zero and four.
207
00:09:39,210 --> 00:09:41,173
Now we swap zero and two,
208
00:09:43,230 --> 00:09:46,900
and now we have a valid binary heap.
209
00:09:46,900 --> 00:09:49,160
Now we're gonna remove
four but from the heap
210
00:09:49,160 --> 00:09:50,663
by swapping it with zero,
211
00:09:52,050 --> 00:09:55,770
and zero percolates down
by swapping with three.
212
00:09:55,770 --> 00:09:58,353
We remove three by swapping with one.
213
00:09:59,240 --> 00:10:00,600
Now we'll swap one and two
214
00:10:00,600 --> 00:10:02,513
to restore the heap order property.
215
00:10:03,950 --> 00:10:06,453
We're gonna remove two
by swapping with zero.
216
00:10:08,650 --> 00:10:10,240
Now we swap zero and one
217
00:10:10,240 --> 00:10:12,223
to restore the heap order property.
218
00:10:13,730 --> 00:10:18,730
Now we'll remove one, and
now we've consumed our heap,
219
00:10:19,380 --> 00:10:21,810
and our vector is sorted.
220
00:10:21,810 --> 00:10:23,590
See, zero, one, two, three four,
221
00:10:23,590 --> 00:10:24,940
five, six, seven, eight, nine.
222
00:10:24,940 --> 00:10:26,410
And we're done.
223
00:10:26,410 --> 00:10:28,380
Let's take a look at an animation
224
00:10:28,380 --> 00:10:30,973
from the University of
San Francisco website.
225
00:10:33,330 --> 00:10:37,010
So now we're using the visualization tool
226
00:10:37,010 --> 00:10:39,563
at University of San Francisco.
227
00:10:41,610 --> 00:10:44,810
And you may be wondering
what's going on here.
228
00:10:44,810 --> 00:10:47,050
The first thing we need
to do as you recall
229
00:10:47,050 --> 00:10:50,940
is turn our vector into a binary heap.
230
00:10:50,940 --> 00:10:52,500
And so what you see right now
231
00:10:52,500 --> 00:10:55,420
is the preliminary work that's being done
232
00:10:55,420 --> 00:10:58,270
in order to convert the
original input vector
233
00:10:58,270 --> 00:11:01,490
that we started with into a binary heap.
234
00:11:01,490 --> 00:11:06,050
And then once that's done, it
starts the process of sorting.
235
00:11:06,050 --> 00:11:09,500
And so you see now that there are elements
236
00:11:09,500 --> 00:11:13,233
accumulating on the right-hand
side that are sorted,
237
00:11:14,430 --> 00:11:16,860
and the process is identical
238
00:11:16,860 --> 00:11:20,393
to what I just showed you
earlier much more slowly.
239
00:11:22,160 --> 00:11:24,190
We're removing the root element.
240
00:11:24,190 --> 00:11:27,780
We're adding that to the sorted elements,
241
00:11:27,780 --> 00:11:29,480
then we're percolating down
242
00:11:30,690 --> 00:11:35,160
until we find a suitable
home for our new root,
243
00:11:35,160 --> 00:11:37,983
and thereby restore the
heap order property.
244
00:11:38,930 --> 00:11:42,780
Now, you may wonder what
the time complexity is here,
245
00:11:42,780 --> 00:11:46,020
and because the height of a binary tree
246
00:11:46,020 --> 00:11:48,283
within elements is log n,
247
00:11:52,130 --> 00:11:53,770
that means that the worst case
248
00:11:53,770 --> 00:11:55,550
to restore the heap order property
249
00:11:55,550 --> 00:11:58,870
is we're gonna need to
perform log n swaps.
250
00:11:58,870 --> 00:12:00,500
And we have to do that n times,
251
00:12:00,500 --> 00:12:02,640
we have n elements we need to sort,
252
00:12:02,640 --> 00:12:07,200
and in each case we could
perform up to log n swaps.
253
00:12:07,200 --> 00:12:10,320
And so that's how we
get the time complexity
254
00:12:10,320 --> 00:12:14,163
of o of n log n for this algorithm.
255
00:12:32,230 --> 00:12:33,320
But you see this works
256
00:12:33,320 --> 00:12:36,333
just the way we
demonstrated in the slides,
257
00:12:38,620 --> 00:12:40,083
just with a lot more numbers.
258
00:12:42,910 --> 00:12:45,883
And now we're done, and we
have this nice sorted vector.
259
00:12:50,930 --> 00:12:54,620
Now, let's go back to
our comparison chart.
260
00:12:54,620 --> 00:12:56,530
Here we've added heap sort
261
00:12:56,530 --> 00:13:00,110
among the algorithms that we've examined.
262
00:13:00,110 --> 00:13:05,110
Heap sort has a time
complexity of o of n log n,
263
00:13:05,110 --> 00:13:08,260
that's the worst case and average case.
264
00:13:08,260 --> 00:13:11,190
So unlike Quicksort, it
doesn't have a worst case
265
00:13:11,190 --> 00:13:12,500
of o of n squared,
266
00:13:12,500 --> 00:13:16,890
it's guaranteed to perform
an o of n log n time.
267
00:13:16,890 --> 00:13:21,890
It has o of one space
complexity, but it is not stable.
268
00:13:23,140 --> 00:13:25,240
Unlike selection sort,
269
00:13:25,240 --> 00:13:28,090
but the unsorted elements are in a heap.
270
00:13:28,090 --> 00:13:30,180
And so that's where we
get the performance boost
271
00:13:30,180 --> 00:13:31,493
over selection sort.
272
00:13:32,880 --> 00:13:34,003
So in summary,
273
00:13:34,870 --> 00:13:37,230
heap sort is an improved selection sort
274
00:13:37,230 --> 00:13:40,680
where we use a heap to
manage our unsorted elements.
275
00:13:40,680 --> 00:13:43,550
Heap sort has average case
and worst case performance
276
00:13:43,550 --> 00:13:47,293
of o of n log n, and
heap sort is not stable.
277
00:13:48,360 --> 00:13:52,283
And that wraps up our survey
of sorting algorithms.
278
00:13:53,210 --> 00:13:54,950
In the next presentation,
279
00:13:54,950 --> 00:13:58,600
we will implement heap sort in C++,
280
00:13:58,600 --> 00:14:00,593
and then we'll move on to new topics.