1
00:00:00,870 --> 00:00:03,063
- [Instructor] Bucket Sort and Radix Sort.
2
00:00:05,130 --> 00:00:06,650
Bucket sort and radix sort
3
00:00:06,650 --> 00:00:08,860
work by distributing and collecting
4
00:00:08,860 --> 00:00:10,910
the elements to be sorted.
5
00:00:10,910 --> 00:00:13,530
They don't perform comparisons.
6
00:00:13,530 --> 00:00:15,260
This is a different approach
7
00:00:15,260 --> 00:00:17,233
from any that we've seen so far.
8
00:00:18,360 --> 00:00:20,590
All the algorithms we've seen so far
9
00:00:20,590 --> 00:00:23,520
use comparison to perform the sort
10
00:00:23,520 --> 00:00:25,730
and these algorithms can't do any better
11
00:00:25,730 --> 00:00:27,283
than big O of n log n.
12
00:00:28,800 --> 00:00:32,660
But bucket sort and radix sort
don't perform comparisons.
13
00:00:32,660 --> 00:00:34,820
They're sorting algorithms
that work best when
14
00:00:34,820 --> 00:00:38,400
we have more or less
uniformly distributed data.
15
00:00:38,400 --> 00:00:40,340
And we'll see that bucket sort works best
16
00:00:40,340 --> 00:00:43,523
when we have a limited
range of possible values.
17
00:00:46,240 --> 00:00:48,300
Here's pseudocode for bucket sort
18
00:00:48,300 --> 00:00:50,630
adapted from the textbook.
19
00:00:50,630 --> 00:00:52,700
We take an input vector.
20
00:00:52,700 --> 00:00:54,260
We make some buckets
21
00:00:54,260 --> 00:00:57,470
to places where we're
going to hold elements.
22
00:00:57,470 --> 00:01:00,000
We distribute the
elements into the buckets
23
00:01:00,850 --> 00:01:03,653
then we sort the items within each bucket.
24
00:01:04,500 --> 00:01:06,660
And then ,we gather the results back
25
00:01:06,660 --> 00:01:08,123
into the original vector.
26
00:01:08,980 --> 00:01:11,880
Needless to say, this pseudocode here
27
00:01:11,880 --> 00:01:14,053
sweeps a lot of detail under the rug.
28
00:01:15,650 --> 00:01:17,660
Let's look at this in
a little more detail.
29
00:01:17,660 --> 00:01:19,940
This again is adapted from the textbook
30
00:01:19,940 --> 00:01:21,633
figure six dash 12.
31
00:01:23,440 --> 00:01:26,880
We start with an unsorted
vector of values.
32
00:01:26,880 --> 00:01:28,920
Notice that we have 10 values
33
00:01:28,920 --> 00:01:31,830
and their range is relatively restricted.
34
00:01:31,830 --> 00:01:36,710
All of these values fall
within the range zero to 100.
35
00:01:36,710 --> 00:01:39,003
We don't have 10 million, for example.
36
00:01:39,910 --> 00:01:41,710
First we distribute the elements
37
00:01:41,710 --> 00:01:44,090
into the appropriate buckets.
38
00:01:44,090 --> 00:01:48,910
This first bucket here,
holds values from zero to 19.
39
00:01:48,910 --> 00:01:52,690
The second holds values from 20 to 39.
40
00:01:52,690 --> 00:01:56,220
The third holds values from 40 to 59,
41
00:01:56,220 --> 00:01:59,080
the fourth 60 to 79
42
00:01:59,080 --> 00:02:01,940
and last 80 to 99.
43
00:02:01,940 --> 00:02:03,500
Then once we have all the values
44
00:02:03,500 --> 00:02:05,280
in their respective buckets,
45
00:02:05,280 --> 00:02:08,043
we sort the values within the buckets.
46
00:02:10,130 --> 00:02:13,080
And from there, it's a simple
matter to gather the results
47
00:02:13,080 --> 00:02:14,940
and return them to the original vector
48
00:02:14,940 --> 00:02:17,150
now in sorted order.
49
00:02:17,150 --> 00:02:19,440
And again, there's a lot of detail
50
00:02:19,440 --> 00:02:21,033
being swept under the rug here.
51
00:02:22,380 --> 00:02:26,370
For example, how do we decide
how many buckets to use
52
00:02:28,210 --> 00:02:29,790
and how do we sort the elements
53
00:02:29,790 --> 00:02:32,193
once they've been
distributed into the buckets?
54
00:02:33,260 --> 00:02:35,120
Let's set aside for the time being
55
00:02:35,120 --> 00:02:37,010
the question of how many buckets to use
56
00:02:37,010 --> 00:02:38,793
and focus on this second question.
57
00:02:39,670 --> 00:02:42,780
The answer is any number of ways.
58
00:02:42,780 --> 00:02:45,950
We could recursively call
bucket sort on each bucket
59
00:02:45,950 --> 00:02:48,980
as we've seen with the
divide and conquer approach
60
00:02:48,980 --> 00:02:51,100
or we could use another sort algorithm
61
00:02:51,100 --> 00:02:55,550
like insertion sort to sort the
elements within each bucket.
62
00:02:55,550 --> 00:02:57,640
And there are times when
you'd like to implement
63
00:02:57,640 --> 00:03:00,330
bucket sort in different ways
64
00:03:00,330 --> 00:03:03,230
and different sorting
algorithms might be appropriate
65
00:03:03,230 --> 00:03:04,143
at this level.
66
00:03:06,270 --> 00:03:09,310
Now to the question of
how many buckets to use.
67
00:03:09,310 --> 00:03:12,070
Obviously we don't want just
one, that would be no help.
68
00:03:12,070 --> 00:03:13,580
We take the whole input vector,
69
00:03:13,580 --> 00:03:16,930
dump it into a single bucket
and nothing would be changed.
70
00:03:16,930 --> 00:03:18,350
And we don't want the same number
71
00:03:18,350 --> 00:03:20,680
of buckets as there are
elements in the vector
72
00:03:20,680 --> 00:03:22,410
that too would be no help.
73
00:03:22,410 --> 00:03:24,770
So the answer is, again, it depends
74
00:03:24,770 --> 00:03:26,653
and there's no hard and fast rule.
75
00:03:27,500 --> 00:03:31,130
It depends on the nature and
distribution of your data.
76
00:03:31,130 --> 00:03:33,000
What do I mean by that?
77
00:03:33,000 --> 00:03:34,603
Let's look at some examples.
78
00:03:35,480 --> 00:03:39,900
Here we have a vector
3,000, 1,000 zero and 2,000
79
00:03:40,770 --> 00:03:42,770
and we wanna sort this using bucket sort
80
00:03:43,680 --> 00:03:47,290
but notice that the values
are pretty well spread out.
81
00:03:47,290 --> 00:03:49,550
They range from 3000 to zero.
82
00:03:49,550 --> 00:03:51,480
They are only four values.
83
00:03:51,480 --> 00:03:53,070
So that's kind of sparse.
84
00:03:53,070 --> 00:03:54,653
We'd call that sparse.
85
00:03:55,510 --> 00:03:56,800
And if your data are sparse
86
00:03:56,800 --> 00:03:59,160
you don't wanna have too many buckets.
87
00:03:59,160 --> 00:04:01,960
Let's say we had this vector as data.
88
00:04:01,960 --> 00:04:03,450
And if we want to sort these
89
00:04:03,450 --> 00:04:06,800
it wouldn't do as much
good to have 600 buckets.
90
00:04:06,800 --> 00:04:08,360
We'd wanna have something say
91
00:04:08,360 --> 00:04:10,193
between three and five buckets.
92
00:04:11,330 --> 00:04:13,180
Now look at this vector.
93
00:04:13,180 --> 00:04:14,650
This is very different.
94
00:04:14,650 --> 00:04:16,400
What if we had data like this,
95
00:04:16,400 --> 00:04:19,600
densely clustered around a single value.
96
00:04:19,600 --> 00:04:23,560
Notice that all the values
are pretty close to 2000.
97
00:04:23,560 --> 00:04:27,570
There's a big gap from zero to 1998.
98
00:04:27,570 --> 00:04:30,600
There's nothing beyond 2010
99
00:04:30,600 --> 00:04:35,233
and everything is closely
clustered around that value 2000.
100
00:04:37,070 --> 00:04:39,990
So what happens if we have data like this?
101
00:04:39,990 --> 00:04:41,950
If we make our buckets too big
102
00:04:41,950 --> 00:04:45,260
then all these values might
wind up in the same bucket.
103
00:04:45,260 --> 00:04:47,970
And as you might expect,
worst case performance
104
00:04:47,970 --> 00:04:50,210
for bucket sort occurs when all the values
105
00:04:50,210 --> 00:04:51,793
fall into a single bucket.
106
00:04:53,530 --> 00:04:55,130
Which brings us to the question
107
00:04:55,130 --> 00:04:57,380
of Bucket Sort Complexity.
108
00:04:57,380 --> 00:05:00,770
Assuming data are reasonably
uniformly distributed,
109
00:05:00,770 --> 00:05:04,500
assuming that they are, and
that you have n elements
110
00:05:04,500 --> 00:05:06,170
and m buckets,
111
00:05:06,170 --> 00:05:08,310
distributing takes n steps
112
00:05:08,310 --> 00:05:09,290
because we have n elements
113
00:05:09,290 --> 00:05:11,773
we have to dish them out
into so many buckets.
114
00:05:12,610 --> 00:05:16,660
Sorting each bucket takes
f of n divided by n steps
115
00:05:16,660 --> 00:05:18,330
where f is the runtime
116
00:05:18,330 --> 00:05:22,000
of the function used to sort the buckets.
117
00:05:22,000 --> 00:05:23,200
So you recall, I said
118
00:05:23,200 --> 00:05:26,330
we could use a bucket sort recursively
119
00:05:26,330 --> 00:05:29,080
or we could use insertion sort.
120
00:05:29,080 --> 00:05:33,333
This step depends on the sorting
algorithm that we choose.
121
00:05:34,410 --> 00:05:37,700
But with m buckets, we multiply and we get
122
00:05:37,700 --> 00:05:39,910
m times f of n divided by n
123
00:05:41,560 --> 00:05:43,400
and gathering takes n steps.
124
00:05:43,400 --> 00:05:46,750
We have n elements, it takes
n steps to gather them.
125
00:05:46,750 --> 00:05:48,850
So we wind up with big O of n
126
00:05:49,700 --> 00:05:53,810
plus big O of m times f of n divided by n
127
00:05:53,810 --> 00:05:54,690
plus O of n
128
00:05:56,140 --> 00:05:56,973
and
129
00:05:58,030 --> 00:05:59,590
this,
130
00:05:59,590 --> 00:06:01,850
n divided by n, that's just the number
131
00:06:01,850 --> 00:06:04,000
of elements divided by
the number of buckets.
132
00:06:04,000 --> 00:06:05,960
That's a constant.
133
00:06:05,960 --> 00:06:10,130
And we have two O of n terms
that gets us O of two n
134
00:06:10,130 --> 00:06:12,960
and we just ignore the
leading coefficient.
135
00:06:12,960 --> 00:06:17,143
And so this whole thing
simplifies to big O of n plus n.
136
00:06:18,240 --> 00:06:19,490
And notice that this means
137
00:06:19,490 --> 00:06:22,633
that performance depends
on the number of buckets.
138
00:06:23,710 --> 00:06:25,780
Also remember that this only holds true
139
00:06:25,780 --> 00:06:28,410
if the assumption we gave before is true,
140
00:06:28,410 --> 00:06:31,343
that data are reasonably
uniformly distributed.
141
00:06:32,420 --> 00:06:35,003
Now let's add bucket sort
to our comparison chart.
142
00:06:35,940 --> 00:06:37,440
Here we have bucket sort.
143
00:06:37,440 --> 00:06:41,790
We see that it has big O
of n plus m time complexity
144
00:06:41,790 --> 00:06:46,210
big O of n plus m space complexity.
145
00:06:46,210 --> 00:06:49,470
And it depends, whether
it's stable or not.
146
00:06:49,470 --> 00:06:51,070
What does it depend on?
147
00:06:51,070 --> 00:06:53,020
It depends on the sorting algorithm
148
00:06:53,020 --> 00:06:56,290
that's used to sort the
elements within each bucket.
149
00:06:56,290 --> 00:06:58,343
If we choose a stable sorting algorithm
150
00:06:58,343 --> 00:07:00,600
then bucket sort will be stable.
151
00:07:00,600 --> 00:07:03,010
If we choose an unstable sorting algorithm
152
00:07:03,010 --> 00:07:04,993
than bucket sort will not be stable.
153
00:07:05,900 --> 00:07:07,970
And this note is important
154
00:07:07,970 --> 00:07:10,970
bucket sort requires
that certain properties
155
00:07:10,970 --> 00:07:13,903
hold for the data in
order to be effective.
156
00:07:15,370 --> 00:07:17,850
Now, let's look at a cousin of bucket sort
157
00:07:17,850 --> 00:07:19,053
called radix sort.
158
00:07:20,440 --> 00:07:23,620
Radix sort is a related
algorithm that works with data
159
00:07:23,620 --> 00:07:25,930
that could be sorted lexicographically.
160
00:07:25,930 --> 00:07:26,830
What does that mean?
161
00:07:26,830 --> 00:07:28,013
Lexicographically?
162
00:07:28,850 --> 00:07:31,040
Well, lexicographic sorting is just like
163
00:07:31,040 --> 00:07:33,340
sorting words in a dictionary.
164
00:07:33,340 --> 00:07:34,660
In a dictionary,
165
00:07:34,660 --> 00:07:36,920
we sort on the first letter
166
00:07:36,920 --> 00:07:40,190
and then on the second letter and so on.
167
00:07:40,190 --> 00:07:43,120
So aardvark comes before Abacus.
168
00:07:43,120 --> 00:07:45,480
That's what we mean by
lexicographic sorting,
169
00:07:45,480 --> 00:07:47,023
it's dictionary sorting.
170
00:07:48,280 --> 00:07:50,850
And here we'll take a
look at sorting numbers
171
00:07:50,850 --> 00:07:52,420
based on their digits
172
00:07:52,420 --> 00:07:55,100
but we're going to work from
the least significant digit
173
00:07:55,100 --> 00:07:57,380
to the most significant digit.
174
00:07:57,380 --> 00:08:00,390
This is called LSD radix sort.
175
00:08:00,390 --> 00:08:03,800
And radix is just another word for base
176
00:08:03,800 --> 00:08:06,090
as in base two for binary
177
00:08:06,090 --> 00:08:08,830
or base 10 in our
everyday counting system.
178
00:08:08,830 --> 00:08:11,130
The radix is the number of digits
179
00:08:11,130 --> 00:08:13,190
that we have in our system.
180
00:08:13,190 --> 00:08:16,690
So for base 10, we have zero through nine
181
00:08:16,690 --> 00:08:21,110
and radix sort complexity
is big O of n times d
182
00:08:21,110 --> 00:08:23,750
where d is the maximum number of digits
183
00:08:23,750 --> 00:08:26,233
among all the numbers we wish to sort.
184
00:08:27,470 --> 00:08:31,920
So radix sort can outperform
big O of n log n algorithms
185
00:08:31,920 --> 00:08:36,633
whenever d, the number of
digits is less than log n.
186
00:08:37,570 --> 00:08:40,933
Now it may surprise you,
but radix sort is old.
187
00:08:42,000 --> 00:08:43,720
Here's a tabulating machine
188
00:08:43,720 --> 00:08:46,460
that was used to sort punched cards
189
00:08:46,460 --> 00:08:51,093
in the 1890 census using
the radix sorting algorithm.
190
00:08:51,960 --> 00:08:54,760
And while it's been around a long time
191
00:08:54,760 --> 00:08:57,250
it wasn't until the mid 1950s
192
00:08:57,250 --> 00:08:59,360
that an efficient implementation
193
00:08:59,360 --> 00:09:01,980
was invented to run on a computer.
194
00:09:01,980 --> 00:09:05,580
So we've known about radix
sort for quite some time
195
00:09:05,580 --> 00:09:08,880
but it's only been within
the last 50 or 60 years
196
00:09:08,880 --> 00:09:12,043
that we've been able to run
it effectively on computers.
197
00:09:14,610 --> 00:09:16,910
Here's pseudocode for radix sort
198
00:09:16,910 --> 00:09:19,720
and note that this is
for lexicographic sorting
199
00:09:19,720 --> 00:09:22,350
of integers by digit.
200
00:09:22,350 --> 00:09:24,600
We take an input vector,
201
00:09:24,600 --> 00:09:28,420
we're working in base 10,
so we make 10 buckets.
202
00:09:28,420 --> 00:09:30,290
And then for each digit
203
00:09:30,290 --> 00:09:34,650
let's say our largest number is 4,212
204
00:09:34,650 --> 00:09:39,350
we have four digits, for
each digit for each element
205
00:09:39,350 --> 00:09:41,870
we distribute into the appropriate bucket
206
00:09:41,870 --> 00:09:43,640
based on the digit.
207
00:09:43,640 --> 00:09:44,920
And then for each bucket
208
00:09:44,920 --> 00:09:47,960
we gather the results back
into the original vector.
209
00:09:47,960 --> 00:09:50,263
And we repeat that for each digit.
210
00:09:51,290 --> 00:09:53,430
Let's see what that means.
211
00:09:53,430 --> 00:09:55,700
Let's say we have this vector
212
00:09:55,700 --> 00:09:58,010
one nine two three seven nine and so on,
213
00:09:58,010 --> 00:10:02,220
notice that I've inserted
a leading zero here
214
00:10:02,220 --> 00:10:04,430
two leading zeros here,
215
00:10:04,430 --> 00:10:06,300
leading zero here
216
00:10:06,300 --> 00:10:08,470
to show that
217
00:10:08,470 --> 00:10:11,030
when we sort on this digit
218
00:10:11,030 --> 00:10:12,570
that this value is zero.
219
00:10:12,570 --> 00:10:15,973
And when we sort on this
digit, this value is zero.
220
00:10:17,160 --> 00:10:19,900
So for our first sort,
remember that we're working
221
00:10:19,900 --> 00:10:23,510
from least significant digit
to most significant digit.
222
00:10:23,510 --> 00:10:27,070
We're gonna sort on these digits in red
223
00:10:27,070 --> 00:10:31,350
two, nine, two, seven,
four, five, four, three.
224
00:10:31,350 --> 00:10:35,120
And if we do that, then
we get this result.
225
00:10:35,120 --> 00:10:36,670
This is our resulting vector.
226
00:10:36,670 --> 00:10:39,320
Notice that the twos begin for us
227
00:10:39,320 --> 00:10:43,223
then the threes, the fours,
the fives, seven and nine.
228
00:10:44,320 --> 00:10:47,480
Then the next sort, we're going to sort
229
00:10:47,480 --> 00:10:49,560
by the second digit
230
00:10:49,560 --> 00:10:52,830
nine, one, zero, zero and so on.
231
00:10:52,830 --> 00:10:56,980
And if we do that, we'll
get this resulting vector,
232
00:10:56,980 --> 00:10:59,630
notice now that the zeros are leading,
233
00:10:59,630 --> 00:11:04,023
followed by the ones, a
five, two sevens and a nine.
234
00:11:05,360 --> 00:11:08,030
Finally, since we have three digits
235
00:11:08,030 --> 00:11:10,320
we sort on that third digit
236
00:11:10,320 --> 00:11:11,500
and now see we have
237
00:11:11,500 --> 00:11:16,500
two, zero, zero, zero,
four, two, three, one.
238
00:11:16,520 --> 00:11:19,140
And if we sort on that third digit
239
00:11:20,450 --> 00:11:21,710
we get this result
240
00:11:22,610 --> 00:11:27,610
zero, zero, zero, one,
two, two, three, four.
241
00:11:28,390 --> 00:11:30,020
And now you'll notice
242
00:11:31,130 --> 00:11:33,720
that our vector is in sorted order.
243
00:11:33,720 --> 00:11:37,403
So we sorted our vector without
performing any comparisons.
244
00:11:39,390 --> 00:11:41,380
Now, how do we get the value of a digit
245
00:11:41,380 --> 00:11:43,800
without performing comparisons?
246
00:11:43,800 --> 00:11:45,740
Well, we take the number
247
00:11:45,740 --> 00:11:49,920
we divide by the appropriate
power of our radix or base,
248
00:11:49,920 --> 00:11:52,870
and then we take that
value modular the radix.
249
00:11:52,870 --> 00:11:54,790
So if we're working in base 10,
250
00:11:54,790 --> 00:11:56,743
as we've been in these examples,
251
00:11:57,590 --> 00:12:01,380
to get the third digit of 2,708
252
00:12:01,380 --> 00:12:04,290
we integer divide by 10 to the two,
253
00:12:04,290 --> 00:12:09,140
notice that we zero index the
digits, which gives us 27.
254
00:12:09,140 --> 00:12:13,750
And then we take 27 modular
10, which is our radix,
255
00:12:13,750 --> 00:12:15,600
and that yields seven.
256
00:12:15,600 --> 00:12:17,940
And so the third digit
257
00:12:17,940 --> 00:12:20,410
of 2,708
258
00:12:20,410 --> 00:12:21,510
is seven.
259
00:12:21,510 --> 00:12:24,800
So we now know that on our third iteration
260
00:12:24,800 --> 00:12:29,160
2,708 will get distributed
to the sevens bucket.
261
00:12:29,160 --> 00:12:32,293
Notice that the buckets
are also zero indexed.
262
00:12:33,160 --> 00:12:35,123
Let's take a look on visual algo.
263
00:12:36,810 --> 00:12:40,673
Notice that we have input
values from one to 9,680.
264
00:12:43,390 --> 00:12:46,290
We create those 10 buckets
265
00:12:47,530 --> 00:12:49,310
and they're using red
266
00:12:49,310 --> 00:12:52,010
to indicate the digit
that we're sorting on,
267
00:12:52,010 --> 00:12:53,883
the least significant digit.
268
00:12:55,039 --> 00:12:57,433
And then we dish them
back out to the vector.
269
00:13:01,110 --> 00:13:03,290
Then we do that again for the second digit
270
00:13:13,346 --> 00:13:16,800
and notice that we take the
elements out of the buckets
271
00:13:16,800 --> 00:13:18,623
in the order that they went in.
272
00:13:21,060 --> 00:13:23,000
Now we do it for the third digit
273
00:13:32,314 --> 00:13:34,000
and dish them back out to the vector,
274
00:13:34,000 --> 00:13:36,510
and then we'll make one more
pass for the fourth digit,
275
00:13:36,510 --> 00:13:37,610
and we'll be complete.
276
00:13:54,520 --> 00:13:57,233
We skip the empty buckets. That's simple,
277
00:13:59,600 --> 00:14:00,550
and now we're done.
278
00:14:05,070 --> 00:14:07,660
Now we return to our comparison chart
279
00:14:07,660 --> 00:14:10,040
and we've added radix sort.
280
00:14:10,040 --> 00:14:14,150
We see that the time complexity
is big O of n times d
281
00:14:14,150 --> 00:14:18,470
where d is the number of digits
that we have to work with.
282
00:14:18,470 --> 00:14:21,680
And n is the number of
elements in our vector.
283
00:14:21,680 --> 00:14:23,310
Space complexity
284
00:14:23,310 --> 00:14:25,220
is O of n plus d
285
00:14:26,170 --> 00:14:28,310
radix sort is stable
286
00:14:28,310 --> 00:14:30,350
but it also requires that the data
287
00:14:30,350 --> 00:14:32,410
can be sorted lexicographically.
288
00:14:32,410 --> 00:14:34,763
It won't work with any other kind of data.
289
00:14:36,700 --> 00:14:40,170
So in summary, bucket sort and radix sort,
290
00:14:40,170 --> 00:14:43,400
perform sorting without
making comparisons.
291
00:14:43,400 --> 00:14:45,193
They distribute and collect.
292
00:14:46,070 --> 00:14:49,380
They can outperform big
O of n log n algorithms,
293
00:14:49,380 --> 00:14:51,630
but only under certain circumstances
294
00:14:51,630 --> 00:14:54,170
the data have to have certain properties,
295
00:14:54,170 --> 00:14:56,320
both in terms of the data type
296
00:14:56,320 --> 00:14:58,033
and the distribution of the data.
297
00:14:59,080 --> 00:15:02,260
In the next video, we'll
implement a version
298
00:15:02,260 --> 00:15:04,700
of radix sort in C plus plus.
299
00:15:04,700 --> 00:15:05,653
That's all for now.