1
00:00:00,640 --> 00:00:02,640
- [Instructor] Merge Sort.
2
00:00:02,640 --> 00:00:05,000
We start here on this
slide with a photograph
3
00:00:05,000 --> 00:00:07,150
of computing pioneer John von Neumann
4
00:00:07,150 --> 00:00:09,410
standing next to part
of one of the computers
5
00:00:09,410 --> 00:00:10,513
he helped design.
6
00:00:11,790 --> 00:00:14,100
I show his photo here
because he was the inventor
7
00:00:14,100 --> 00:00:17,210
of merge sort back in 1945.
8
00:00:17,210 --> 00:00:20,820
Now merge sort is not like
the other sorting algorithms
9
00:00:20,820 --> 00:00:22,000
that you've seen so far.
10
00:00:22,000 --> 00:00:24,393
It's a very different kind of animal.
11
00:00:25,370 --> 00:00:29,310
Merge sort is a
divide-and-conquer algorithm,
12
00:00:29,310 --> 00:00:31,120
and divide-and-conquer algorithms
13
00:00:31,120 --> 00:00:33,650
break down a problem recursively,
14
00:00:33,650 --> 00:00:36,220
and then combine the
results of subproblems
15
00:00:36,220 --> 00:00:37,963
to produce the final result.
16
00:00:40,100 --> 00:00:42,720
Now, let's think back to
the first week of the course
17
00:00:42,720 --> 00:00:44,113
when we covered recursion.
18
00:00:45,030 --> 00:00:48,390
As you recall, recursive
functions call themselves,
19
00:00:48,390 --> 00:00:50,240
that's what makes them recursive,
20
00:00:50,240 --> 00:00:54,300
and recursive functions
require one or more base cases.
21
00:00:54,300 --> 00:00:55,410
Without a base case,
22
00:00:55,410 --> 00:00:58,190
a recursive function could
call itself indefinitely,
23
00:00:58,190 --> 00:00:59,720
and never terminate.
24
00:00:59,720 --> 00:01:01,280
Hence we need the base cases.
25
00:01:01,280 --> 00:01:03,393
The base cases are not recursive.
26
00:01:04,570 --> 00:01:07,230
The examples we've seen
earlier in the course
27
00:01:07,230 --> 00:01:09,563
are Fibonacci and vectorial.
28
00:01:10,530 --> 00:01:14,020
Let's refresh our memory
with regard to Fibonacci.
29
00:01:14,020 --> 00:01:16,790
The Fibonacci sequence
starts with zero and one,
30
00:01:16,790 --> 00:01:20,340
and then calculates subsequent
terms by taking the sum
31
00:01:20,340 --> 00:01:23,200
of the previous two terms in the sequence.
32
00:01:23,200 --> 00:01:26,200
So a Fibonacci number, denoted F sub n,
33
00:01:26,200 --> 00:01:29,540
where n is an index, is calculated thus.
34
00:01:29,540 --> 00:01:31,010
F sub n is zero.
35
00:01:31,010 --> 00:01:33,880
If n is zero F sub n is one.
36
00:01:33,880 --> 00:01:38,880
If n is 1 then F sub n
is equal to F sub n - 2
37
00:01:39,490 --> 00:01:42,283
plus F sub n - 1 otherwise.
38
00:01:43,990 --> 00:01:46,173
So these are our base cases,
39
00:01:47,550 --> 00:01:49,860
and this is the recursive case.
40
00:01:49,860 --> 00:01:52,360
And we can think of this as
breaking the problem down
41
00:01:52,360 --> 00:01:54,410
into smaller subproblems.
42
00:01:54,410 --> 00:01:57,020
The nth Fibonacci number is the sum
43
00:01:57,020 --> 00:01:59,540
of two smaller Fibonacci numbers,
44
00:01:59,540 --> 00:02:02,480
F sub n - 2 and F sub n - 1.
45
00:02:02,480 --> 00:02:05,090
And the solutions to
these smaller problems
46
00:02:05,090 --> 00:02:08,260
are combined to produce the final result.
47
00:02:08,260 --> 00:02:12,423
Here, we combined the solutions
to sub problems by addition.
48
00:02:14,620 --> 00:02:16,150
But what about merge sort?
49
00:02:16,150 --> 00:02:18,470
Well, merge sort breaks the problem
50
00:02:18,470 --> 00:02:21,190
of sorting down into smaller subproblems.
51
00:02:21,190 --> 00:02:23,350
This is done recursively.
52
00:02:23,350 --> 00:02:24,913
Now we need a base case.
53
00:02:26,100 --> 00:02:28,713
Base case is a vector of a single element.
54
00:02:29,570 --> 00:02:32,580
The single element is
sorted by definition.
55
00:02:32,580 --> 00:02:34,580
Now that may seem trivial,
56
00:02:34,580 --> 00:02:35,620
but it is, in fact,
57
00:02:35,620 --> 00:02:37,173
the base case for merge sort.
58
00:02:39,030 --> 00:02:41,080
And how do we produce a solution?
59
00:02:41,080 --> 00:02:42,520
In the case of Fibonacci
60
00:02:42,520 --> 00:02:44,710
we just added the two subsolutions,
61
00:02:44,710 --> 00:02:47,550
but we can't do that when we're sorting.
62
00:02:47,550 --> 00:02:48,673
So what do we do?
63
00:02:49,990 --> 00:02:50,940
We merge.
64
00:02:50,940 --> 00:02:53,193
And this is where merge
sort gets it's name.
65
00:02:54,230 --> 00:02:55,550
What is merging?
66
00:02:55,550 --> 00:02:58,260
Well, here's a simple case.
67
00:02:58,260 --> 00:03:00,800
We have 2 one element vectors,
68
00:03:00,800 --> 00:03:05,050
and we merge them to produce
1 two element vectors.
69
00:03:05,050 --> 00:03:08,670
Notice that both of the
source vectors we're emerging,
70
00:03:08,670 --> 00:03:10,410
are already sorted.
71
00:03:10,410 --> 00:03:12,010
They're one element vectors,
72
00:03:12,010 --> 00:03:13,560
and they are sorted by default.
73
00:03:14,460 --> 00:03:16,287
Now you're probably thinking,
74
00:03:16,287 --> 00:03:17,980
"Oh, that's just brilliant,
75
00:03:17,980 --> 00:03:19,810
but how does this help me sort a vector
76
00:03:19,810 --> 00:03:21,830
of a million elements?"
77
00:03:21,830 --> 00:03:24,330
All I can say right now is bear with me,
78
00:03:24,330 --> 00:03:26,013
it'll all become clear in time.
79
00:03:28,120 --> 00:03:30,240
So here's a more complicated example.
80
00:03:30,240 --> 00:03:33,450
Here we have to compare the
single element on the left
81
00:03:33,450 --> 00:03:35,150
with the single element on the right,
82
00:03:35,150 --> 00:03:37,810
and we use this comparison
to merge the elements
83
00:03:37,810 --> 00:03:40,543
from the two source vectors
in the correct order.
84
00:03:41,980 --> 00:03:43,003
So far, so good.
85
00:03:44,770 --> 00:03:46,670
Here's another example.
86
00:03:46,670 --> 00:03:49,570
We've already merged 4 one element vectors
87
00:03:49,570 --> 00:03:51,510
into 2 two element vectors,
88
00:03:51,510 --> 00:03:55,590
and now we want to merge them
into 1 four element vector.
89
00:03:55,590 --> 00:03:58,000
Now this case is pretty simple.
90
00:03:58,000 --> 00:04:00,350
But what if we wanted to write a function
91
00:04:00,350 --> 00:04:03,460
that can merge any two sorted vectors
92
00:04:03,460 --> 00:04:06,220
into a single, larger sorted vector?
93
00:04:06,220 --> 00:04:07,353
How would that work?
94
00:04:08,410 --> 00:04:09,880
Obviously it would have to be able
95
00:04:09,880 --> 00:04:12,360
to handle a case like this,
96
00:04:12,360 --> 00:04:17,010
but notice that the two input
vectors are sorted already.
97
00:04:17,010 --> 00:04:19,453
So, we can use that information.
98
00:04:22,130 --> 00:04:24,300
We know those two input
vectors are sorted.
99
00:04:24,300 --> 00:04:27,950
So the smallest value in each
will be the left-most position
100
00:04:27,950 --> 00:04:29,283
so we can start there.
101
00:04:30,230 --> 00:04:31,960
So let's begin by figuring out
102
00:04:31,960 --> 00:04:33,290
what should be the first element
103
00:04:33,290 --> 00:04:35,040
in the result vector at the bottom.
104
00:04:35,994 --> 00:04:37,290
Now we just compare the two values
105
00:04:37,290 --> 00:04:40,100
at the indicated positions
in the source vectors,
106
00:04:40,100 --> 00:04:42,563
and put that value into our result vector.
107
00:04:43,560 --> 00:04:44,543
Simple, one.
108
00:04:46,040 --> 00:04:46,873
Now what?
109
00:04:46,873 --> 00:04:48,590
Well, we've used an element
110
00:04:48,590 --> 00:04:50,910
from the right-hand source vector,
111
00:04:50,910 --> 00:04:53,450
and we don't want to
use that element again.
112
00:04:53,450 --> 00:04:55,483
So let's move that arrow to the right.
113
00:04:57,830 --> 00:05:00,600
And we want to move the
arrow on the result vector
114
00:05:00,600 --> 00:05:02,573
to point to the next empty position.
115
00:05:03,410 --> 00:05:06,450
And now, we repeat what we did before.
116
00:05:06,450 --> 00:05:08,280
We compare the elements indicated
117
00:05:08,280 --> 00:05:09,930
by the arrows and the source vectors,
118
00:05:09,930 --> 00:05:11,560
and choose the smaller of the two.
119
00:05:11,560 --> 00:05:13,593
Here the value is two.
120
00:05:14,950 --> 00:05:17,820
We put that into our result vector.
121
00:05:17,820 --> 00:05:22,733
And then, we move the arrows
as we did before and repeat.
122
00:05:24,250 --> 00:05:26,780
Again, we choose the smaller
of the two values three,
123
00:05:26,780 --> 00:05:29,780
and put it into the result
vector in the indicated position.
124
00:05:31,300 --> 00:05:34,120
We've used up all the values
in the left-hand source vector.
125
00:05:34,120 --> 00:05:36,130
So we're not going to
choose any more values
126
00:05:36,130 --> 00:05:37,083
from that vector.
127
00:05:38,030 --> 00:05:40,700
And, we only have one value to choose
128
00:05:40,700 --> 00:05:42,460
in the right-hand side.
129
00:05:42,460 --> 00:05:45,703
So we're gonna pick that to
complete our result vector.
130
00:05:47,960 --> 00:05:49,150
And our merge is complete.
131
00:05:49,150 --> 00:05:50,900
Now that wasn't too difficult.
132
00:05:50,900 --> 00:05:54,360
The important thing is that
this process works just as well
133
00:05:54,360 --> 00:05:56,390
with vectors of any size.
134
00:05:56,390 --> 00:05:59,770
If we want to merge 2
million element vectors,
135
00:05:59,770 --> 00:06:01,700
the process would be the same.
136
00:06:01,700 --> 00:06:02,660
Take a little longer,
137
00:06:02,660 --> 00:06:04,360
but the process would be the same.
138
00:06:06,040 --> 00:06:07,870
Now, for efficiency's sake,
139
00:06:07,870 --> 00:06:10,390
we can use a single vector as a source,
140
00:06:10,390 --> 00:06:13,823
so long as it consists
of two sorted subvectors.
141
00:06:14,870 --> 00:06:18,163
So, let's expand on
our example just a bit.
142
00:06:19,420 --> 00:06:23,120
Here we have our two input vectors,
143
00:06:23,120 --> 00:06:27,620
actually they're sorted
subvectors of four elements each,
144
00:06:27,620 --> 00:06:30,110
and we want to merge
these into a single vector
145
00:06:30,110 --> 00:06:31,783
of eight sorted elements.
146
00:06:32,740 --> 00:06:35,900
We need to keep track of where
our subvectors begin and end
147
00:06:35,900 --> 00:06:38,805
so we know which is the left subvector,
148
00:06:38,805 --> 00:06:40,870
and which is the right subvector.
149
00:06:40,870 --> 00:06:44,410
And so, the input on the
left-hand side starts at start,
150
00:06:44,410 --> 00:06:46,210
we're calling that position start,
151
00:06:46,210 --> 00:06:47,453
and runs to middle.
152
00:06:48,300 --> 00:06:50,170
And the input on the right-hand side
153
00:06:50,170 --> 00:06:52,443
runs from middle plus one to the end.
154
00:06:53,890 --> 00:06:54,850
Then, as before,
155
00:06:54,850 --> 00:06:56,520
we have three indices,
156
00:06:56,520 --> 00:06:58,370
one into the left-hand input,
157
00:06:58,370 --> 00:06:59,920
we'll call that i,
158
00:06:59,920 --> 00:07:01,600
one into the right-hand input,
159
00:07:01,600 --> 00:07:03,340
We'll call that j,
160
00:07:03,340 --> 00:07:05,070
and one into the result vector,
161
00:07:05,070 --> 00:07:06,530
We'll call that k.
162
00:07:06,530 --> 00:07:07,913
So i, j, and k.
163
00:07:09,000 --> 00:07:11,580
With this setup, everything
proceeds as it did
164
00:07:11,580 --> 00:07:13,510
in our earlier example.
165
00:07:13,510 --> 00:07:15,500
What goes into position k?
166
00:07:15,500 --> 00:07:18,450
The smaller of the values
indexed by i and j.
167
00:07:18,450 --> 00:07:19,433
In this case one.
168
00:07:20,760 --> 00:07:22,400
We increment our indices,
169
00:07:22,400 --> 00:07:24,803
adding one to j and one to k.
170
00:07:25,650 --> 00:07:27,120
What's next?
171
00:07:27,120 --> 00:07:29,660
Well, two's smaller than
three so we choose two
172
00:07:29,660 --> 00:07:30,933
from the right-hand side.
173
00:07:31,960 --> 00:07:34,180
And we increment our indices.
174
00:07:34,180 --> 00:07:35,540
What's next?
175
00:07:35,540 --> 00:07:38,270
We choose three from the left-hand side,
176
00:07:38,270 --> 00:07:40,573
and we increment the appropriate indices.
177
00:07:42,260 --> 00:07:45,290
Now four from the right-hand side.
178
00:07:45,290 --> 00:07:46,203
Five,
179
00:07:47,230 --> 00:07:48,223
six,
180
00:07:49,390 --> 00:07:51,583
seven from the right hand side,
181
00:07:52,490 --> 00:07:54,453
and eight from the left-hand side.
182
00:07:56,250 --> 00:07:59,460
And now we can copy the
values that we've produced
183
00:07:59,460 --> 00:08:01,110
back to the original input vector,
184
00:08:01,110 --> 00:08:05,163
and we have a sorted, merged vector.
185
00:08:07,740 --> 00:08:10,560
So merge sort works by
breaking the problem down
186
00:08:10,560 --> 00:08:12,960
into single-element vectors,
187
00:08:12,960 --> 00:08:14,740
or single-element subvectors,
188
00:08:14,740 --> 00:08:16,460
as the case may be.
189
00:08:16,460 --> 00:08:19,720
And then building a solution
through successive merges.
190
00:08:19,720 --> 00:08:22,630
And merging really is at
the heart of this algorithm.
191
00:08:22,630 --> 00:08:24,800
Breaking the problem
down into its components
192
00:08:24,800 --> 00:08:25,853
is almost trivial.
193
00:08:28,330 --> 00:08:30,410
So here's the big picture.
194
00:08:30,410 --> 00:08:35,050
The algorithm splits the input
into one element vectors.
195
00:08:35,050 --> 00:08:38,000
A one element vector
represents the base case
196
00:08:38,000 --> 00:08:39,060
in the recursions.
197
00:08:39,060 --> 00:08:41,890
And so one element
vector is already sorted.
198
00:08:41,890 --> 00:08:44,820
And then using the merge
algorithm we just saw,
199
00:08:44,820 --> 00:08:48,700
merge sort builds the solution
from the subsolutions.
200
00:08:48,700 --> 00:08:50,243
Divide-and-conquer.
201
00:08:52,170 --> 00:08:53,980
You probably know the Simpsons,
202
00:08:53,980 --> 00:08:55,670
and this was intended as a gag,
203
00:08:55,670 --> 00:08:56,920
but if you look carefully,
204
00:08:56,920 --> 00:09:00,110
Marge sort is exactly the
same thing as merge sort.
205
00:09:00,110 --> 00:09:03,480
And this diagram is an accurate
depiction of merge sort.
206
00:09:03,480 --> 00:09:04,883
It's a perfect example.
207
00:09:05,950 --> 00:09:10,260
We break the problem up
into vectors of length one.
208
00:09:10,260 --> 00:09:14,683
Then we merge them putting items
into the appropriate order.
209
00:09:15,610 --> 00:09:17,010
Divide-and-conquer.
210
00:09:17,010 --> 00:09:18,703
Marge sort is merge sort.
211
00:09:21,140 --> 00:09:22,970
Here's some pseudo-code.
212
00:09:22,970 --> 00:09:24,830
First we have the base case
213
00:09:24,830 --> 00:09:28,590
where the vector contains just
one element and we return.
214
00:09:28,590 --> 00:09:31,960
Otherwise, we divide the
input into two parts,
215
00:09:31,960 --> 00:09:34,010
and make two recursive calls,
216
00:09:34,010 --> 00:09:35,170
one on the left part,
217
00:09:35,170 --> 00:09:36,920
and one on the right part.
218
00:09:36,920 --> 00:09:39,270
And when these recursive calls return,
219
00:09:39,270 --> 00:09:40,773
we merge the results.
220
00:09:42,010 --> 00:09:43,263
That's merge sort.
221
00:09:44,910 --> 00:09:46,820
Now what about complexity?
222
00:09:46,820 --> 00:09:49,550
Well merge sort is very efficient.
223
00:09:49,550 --> 00:09:51,200
Dividing the vector is simple.
224
00:09:51,200 --> 00:09:53,330
We just calculate the
midpoint of a vector,
225
00:09:53,330 --> 00:09:55,260
and divide it into two parts.
226
00:09:55,260 --> 00:09:56,853
This takes constant time.
227
00:09:57,750 --> 00:09:58,773
Big O of one.
228
00:09:59,700 --> 00:10:01,340
Merging, as you have seen,
229
00:10:01,340 --> 00:10:03,180
can be done in linear time.
230
00:10:03,180 --> 00:10:04,400
Given n elements,
231
00:10:04,400 --> 00:10:06,440
we can merge them in n steps.
232
00:10:06,440 --> 00:10:07,493
That's big O of n.
233
00:10:08,640 --> 00:10:11,020
But we have to account for the recursion.
234
00:10:11,020 --> 00:10:13,040
Whatever time it takes to perform
235
00:10:13,040 --> 00:10:16,230
division and merging at a given level,
236
00:10:16,230 --> 00:10:18,950
it will take two times the
amount of time it takes
237
00:10:18,950 --> 00:10:22,410
to perform the same work
on a vector half the size,
238
00:10:22,410 --> 00:10:24,150
because we have two subvectors,
239
00:10:24,150 --> 00:10:26,170
for the recursive call.
240
00:10:26,170 --> 00:10:27,963
So we have this recurrence.
241
00:10:29,310 --> 00:10:31,210
T of n, where T is the time
242
00:10:31,210 --> 00:10:34,870
it takes to process input of n elements,
243
00:10:34,870 --> 00:10:36,760
is equal to two times
244
00:10:36,760 --> 00:10:40,690
the time it takes to process
an element of half the size,
245
00:10:40,690 --> 00:10:43,770
plus what it takes to
process the current level.
246
00:10:43,770 --> 00:10:45,990
Now if we solve this recurrence,
247
00:10:45,990 --> 00:10:48,363
we get big O of n log n.
248
00:10:49,370 --> 00:10:52,560
You'll recall that a complete
binary tree of n elements
249
00:10:52,560 --> 00:10:54,000
has log n levels,
250
00:10:54,000 --> 00:10:55,700
and that's part of the reason why
251
00:10:55,700 --> 00:10:58,950
this recurrence is solved this way.
252
00:10:58,950 --> 00:11:01,420
Because as we break our problem down
253
00:11:01,420 --> 00:11:04,130
into sub trees and then merge them,
254
00:11:04,130 --> 00:11:06,640
we have log n levels to work with.
255
00:11:09,233 --> 00:11:10,460
Is merge sort stable?
256
00:11:10,460 --> 00:11:11,293
Yes.
257
00:11:11,293 --> 00:11:14,540
Merge sort preserves the
relative order of equal values
258
00:11:14,540 --> 00:11:15,673
in the input vector.
259
00:11:17,130 --> 00:11:20,390
Let's go back to that comparison
table we've seen before,
260
00:11:20,390 --> 00:11:22,153
now with merge sort added.
261
00:11:23,730 --> 00:11:26,800
You'll see that merge
sort has time complexity
262
00:11:26,800 --> 00:11:29,253
of O of n log n.
263
00:11:30,270 --> 00:11:33,130
But it also has space complexity of O of n
264
00:11:33,130 --> 00:11:36,120
which is more than the
other sort algorithms
265
00:11:36,120 --> 00:11:37,820
that we've seen so far.
266
00:11:37,820 --> 00:11:40,150
That's because we need
that temporary vector
267
00:11:40,150 --> 00:11:41,953
to hold our merge results.
268
00:11:42,830 --> 00:11:46,113
It's stable and it's a
divide-and-conquer algorithm.
269
00:11:48,050 --> 00:11:51,283
Remember this plot from our
introduction to complexity?
270
00:11:52,380 --> 00:11:55,460
O of n squared
271
00:11:55,460 --> 00:11:58,120
is this solid
272
00:11:58,120 --> 00:11:59,993
orange curve here.
273
00:12:00,840 --> 00:12:03,960
And O of n log n
274
00:12:03,960 --> 00:12:07,300
is this dashed green curve.
275
00:12:07,300 --> 00:12:10,850
And from this, you can
see that O of n log n
276
00:12:10,850 --> 00:12:14,230
grows much more slowly
than O of n squared.
277
00:12:14,230 --> 00:12:16,610
So merge sort is much faster
278
00:12:16,610 --> 00:12:20,073
than any of the other sorting
algorithms we've seen so far.
279
00:12:22,440 --> 00:12:26,710
So merge sort is a recursive,
divide-and-conquer algorithm.
280
00:12:26,710 --> 00:12:29,050
We'll see other
divide-and-conquer algorithms
281
00:12:29,050 --> 00:12:29,933
in this course.
282
00:12:30,930 --> 00:12:34,763
Merge sort has O of n
log n time complexity.
283
00:12:35,690 --> 00:12:39,800
Merge sort has O of n space complexity.
284
00:12:39,800 --> 00:12:41,163
And merge sort is stable.
285
00:12:42,200 --> 00:12:47,200
In the next video, we will
implement merge sort in C++.
286
00:12:48,500 --> 00:12:49,450
That's all for now.