1
00:00:00,870 --> 00:00:03,063
- [Instructor] Bucket Sort and Radix Sort.

2
00:00:05,130 --> 00:00:06,650
Bucket sort and radix sort

3
00:00:06,650 --> 00:00:08,860
work by distributing and collecting

4
00:00:08,860 --> 00:00:10,910
the elements to be sorted.

5
00:00:10,910 --> 00:00:13,530
They don't perform comparisons.

6
00:00:13,530 --> 00:00:15,260
This is a different approach

7
00:00:15,260 --> 00:00:17,233
from any that we've seen so far.

8
00:00:18,360 --> 00:00:20,590
All the algorithms we've seen so far

9
00:00:20,590 --> 00:00:23,520
use comparison to perform the sort

10
00:00:23,520 --> 00:00:25,730
and these algorithms can't do any better

11
00:00:25,730 --> 00:00:27,283
than big O of n log n.

12
00:00:28,800 --> 00:00:32,660
But bucket sort and radix sort
don't perform comparisons.

13
00:00:32,660 --> 00:00:34,820
They're sorting algorithms
that work best when

14
00:00:34,820 --> 00:00:38,400
we have more or less
uniformly distributed data.

15
00:00:38,400 --> 00:00:40,340
And we'll see that bucket sort works best

16
00:00:40,340 --> 00:00:43,523
when we have a limited
range of possible values.

17
00:00:46,240 --> 00:00:48,300
Here's pseudocode for bucket sort

18
00:00:48,300 --> 00:00:50,630
adapted from the textbook.

19
00:00:50,630 --> 00:00:52,700
We take an input vector.

20
00:00:52,700 --> 00:00:54,260
We make some buckets

21
00:00:54,260 --> 00:00:57,470
to places where we're
going to hold elements.

22
00:00:57,470 --> 00:01:00,000
We distribute the
elements into the buckets

23
00:01:00,850 --> 00:01:03,653
then we sort the items within each bucket.

24
00:01:04,500 --> 00:01:06,660
And then ,we gather the results back

25
00:01:06,660 --> 00:01:08,123
into the original vector.

26
00:01:08,980 --> 00:01:11,880
Needless to say, this pseudocode here

27
00:01:11,880 --> 00:01:14,053
sweeps a lot of detail under the rug.

28
00:01:15,650 --> 00:01:17,660
Let's look at this in
a little more detail.

29
00:01:17,660 --> 00:01:19,940
This again is adapted from the textbook

30
00:01:19,940 --> 00:01:21,633
figure six dash 12.

31
00:01:23,440 --> 00:01:26,880
We start with an unsorted
vector of values.

32
00:01:26,880 --> 00:01:28,920
Notice that we have 10 values

33
00:01:28,920 --> 00:01:31,830
and their range is relatively restricted.

34
00:01:31,830 --> 00:01:36,710
All of these values fall
within the range zero to 100.

35
00:01:36,710 --> 00:01:39,003
We don't have 10 million, for example.

36
00:01:39,910 --> 00:01:41,710
First we distribute the elements

37
00:01:41,710 --> 00:01:44,090
into the appropriate buckets.

38
00:01:44,090 --> 00:01:48,910
This first bucket here,
holds values from zero to 19.

39
00:01:48,910 --> 00:01:52,690
The second holds values from 20 to 39.

40
00:01:52,690 --> 00:01:56,220
The third holds values from 40 to 59,

41
00:01:56,220 --> 00:01:59,080
the fourth 60 to 79

42
00:01:59,080 --> 00:02:01,940
and last 80 to 99.

43
00:02:01,940 --> 00:02:03,500
Then once we have all the values

44
00:02:03,500 --> 00:02:05,280
in their respective buckets,

45
00:02:05,280 --> 00:02:08,043
we sort the values within the buckets.

46
00:02:10,130 --> 00:02:13,080
And from there, it's a simple
matter to gather the results

47
00:02:13,080 --> 00:02:14,940
and return them to the original vector

48
00:02:14,940 --> 00:02:17,150
now in sorted order.

49
00:02:17,150 --> 00:02:19,440
And again, there's a lot of detail

50
00:02:19,440 --> 00:02:21,033
being swept under the rug here.

51
00:02:22,380 --> 00:02:26,370
For example, how do we decide
how many buckets to use

52
00:02:28,210 --> 00:02:29,790
and how do we sort the elements

53
00:02:29,790 --> 00:02:32,193
once they've been
distributed into the buckets?

54
00:02:33,260 --> 00:02:35,120
Let's set aside for the time being

55
00:02:35,120 --> 00:02:37,010
the question of how many buckets to use

56
00:02:37,010 --> 00:02:38,793
and focus on this second question.

57
00:02:39,670 --> 00:02:42,780
The answer is any number of ways.

58
00:02:42,780 --> 00:02:45,950
We could recursively call
bucket sort on each bucket

59
00:02:45,950 --> 00:02:48,980
as we've seen with the
divide and conquer approach

60
00:02:48,980 --> 00:02:51,100
or we could use another sort algorithm

61
00:02:51,100 --> 00:02:55,550
like insertion sort to sort the
elements within each bucket.

62
00:02:55,550 --> 00:02:57,640
And there are times when
you'd like to implement

63
00:02:57,640 --> 00:03:00,330
bucket sort in different ways

64
00:03:00,330 --> 00:03:03,230
and different sorting
algorithms might be appropriate

65
00:03:03,230 --> 00:03:04,143
at this level.

66
00:03:06,270 --> 00:03:09,310
Now to the question of
how many buckets to use.

67
00:03:09,310 --> 00:03:12,070
Obviously we don't want just
one, that would be no help.

68
00:03:12,070 --> 00:03:13,580
We take the whole input vector,

69
00:03:13,580 --> 00:03:16,930
dump it into a single bucket
and nothing would be changed.

70
00:03:16,930 --> 00:03:18,350
And we don't want the same number

71
00:03:18,350 --> 00:03:20,680
of buckets as there are
elements in the vector

72
00:03:20,680 --> 00:03:22,410
that too would be no help.

73
00:03:22,410 --> 00:03:24,770
So the answer is, again, it depends

74
00:03:24,770 --> 00:03:26,653
and there's no hard and fast rule.

75
00:03:27,500 --> 00:03:31,130
It depends on the nature and
distribution of your data.

76
00:03:31,130 --> 00:03:33,000
What do I mean by that?

77
00:03:33,000 --> 00:03:34,603
Let's look at some examples.

78
00:03:35,480 --> 00:03:39,900
Here we have a vector
3,000, 1,000 zero and 2,000

79
00:03:40,770 --> 00:03:42,770
and we wanna sort this using bucket sort

80
00:03:43,680 --> 00:03:47,290
but notice that the values
are pretty well spread out.

81
00:03:47,290 --> 00:03:49,550
They range from 3000 to zero.

82
00:03:49,550 --> 00:03:51,480
They are only four values.

83
00:03:51,480 --> 00:03:53,070
So that's kind of sparse.

84
00:03:53,070 --> 00:03:54,653
We'd call that sparse.

85
00:03:55,510 --> 00:03:56,800
And if your data are sparse

86
00:03:56,800 --> 00:03:59,160
you don't wanna have too many buckets.

87
00:03:59,160 --> 00:04:01,960
Let's say we had this vector as data.

88
00:04:01,960 --> 00:04:03,450
And if we want to sort these

89
00:04:03,450 --> 00:04:06,800
it wouldn't do as much
good to have 600 buckets.

90
00:04:06,800 --> 00:04:08,360
We'd wanna have something say

91
00:04:08,360 --> 00:04:10,193
between three and five buckets.

92
00:04:11,330 --> 00:04:13,180
Now look at this vector.

93
00:04:13,180 --> 00:04:14,650
This is very different.

94
00:04:14,650 --> 00:04:16,400
What if we had data like this,

95
00:04:16,400 --> 00:04:19,600
densely clustered around a single value.

96
00:04:19,600 --> 00:04:23,560
Notice that all the values
are pretty close to 2000.

97
00:04:23,560 --> 00:04:27,570
There's a big gap from zero to 1998.

98
00:04:27,570 --> 00:04:30,600
There's nothing beyond 2010

99
00:04:30,600 --> 00:04:35,233
and everything is closely
clustered around that value 2000.

100
00:04:37,070 --> 00:04:39,990
So what happens if we have data like this?

101
00:04:39,990 --> 00:04:41,950
If we make our buckets too big

102
00:04:41,950 --> 00:04:45,260
then all these values might
wind up in the same bucket.

103
00:04:45,260 --> 00:04:47,970
And as you might expect,
worst case performance

104
00:04:47,970 --> 00:04:50,210
for bucket sort occurs when all the values

105
00:04:50,210 --> 00:04:51,793
fall into a single bucket.

106
00:04:53,530 --> 00:04:55,130
Which brings us to the question

107
00:04:55,130 --> 00:04:57,380
of Bucket Sort Complexity.

108
00:04:57,380 --> 00:05:00,770
Assuming data are reasonably
uniformly distributed,

109
00:05:00,770 --> 00:05:04,500
assuming that they are, and
that you have n elements

110
00:05:04,500 --> 00:05:06,170
and m buckets,

111
00:05:06,170 --> 00:05:08,310
distributing takes n steps

112
00:05:08,310 --> 00:05:09,290
because we have n elements

113
00:05:09,290 --> 00:05:11,773
we have to dish them out
into so many buckets.

114
00:05:12,610 --> 00:05:16,660
Sorting each bucket takes
f of n divided by n steps

115
00:05:16,660 --> 00:05:18,330
where f is the runtime

116
00:05:18,330 --> 00:05:22,000
of the function used to sort the buckets.

117
00:05:22,000 --> 00:05:23,200
So you recall, I said

118
00:05:23,200 --> 00:05:26,330
we could use a bucket sort recursively

119
00:05:26,330 --> 00:05:29,080
or we could use insertion sort.

120
00:05:29,080 --> 00:05:33,333
This step depends on the sorting
algorithm that we choose.

121
00:05:34,410 --> 00:05:37,700
But with m buckets, we multiply and we get

122
00:05:37,700 --> 00:05:39,910
m times f of n divided by n

123
00:05:41,560 --> 00:05:43,400
and gathering takes n steps.

124
00:05:43,400 --> 00:05:46,750
We have n elements, it takes
n steps to gather them.

125
00:05:46,750 --> 00:05:48,850
So we wind up with big O of n

126
00:05:49,700 --> 00:05:53,810
plus big O of m times f of n divided by n

127
00:05:53,810 --> 00:05:54,690
plus O of n

128
00:05:56,140 --> 00:05:56,973
and

129
00:05:58,030 --> 00:05:59,590
this,

130
00:05:59,590 --> 00:06:01,850
n divided by n, that's just the number

131
00:06:01,850 --> 00:06:04,000
of elements divided by
the number of buckets.

132
00:06:04,000 --> 00:06:05,960
That's a constant.

133
00:06:05,960 --> 00:06:10,130
And we have two O of n terms
that gets us O of two n

134
00:06:10,130 --> 00:06:12,960
and we just ignore the
leading coefficient.

135
00:06:12,960 --> 00:06:17,143
And so this whole thing
simplifies to big O of n plus n.

136
00:06:18,240 --> 00:06:19,490
And notice that this means

137
00:06:19,490 --> 00:06:22,633
that performance depends
on the number of buckets.

138
00:06:23,710 --> 00:06:25,780
Also remember that this only holds true

139
00:06:25,780 --> 00:06:28,410
if the assumption we gave before is true,

140
00:06:28,410 --> 00:06:31,343
that data are reasonably
uniformly distributed.

141
00:06:32,420 --> 00:06:35,003
Now let's add bucket sort
to our comparison chart.

142
00:06:35,940 --> 00:06:37,440
Here we have bucket sort.

143
00:06:37,440 --> 00:06:41,790
We see that it has big O
of n plus m time complexity

144
00:06:41,790 --> 00:06:46,210
big O of n plus m space complexity.

145
00:06:46,210 --> 00:06:49,470
And it depends, whether
it's stable or not.

146
00:06:49,470 --> 00:06:51,070
What does it depend on?

147
00:06:51,070 --> 00:06:53,020
It depends on the sorting algorithm

148
00:06:53,020 --> 00:06:56,290
that's used to sort the
elements within each bucket.

149
00:06:56,290 --> 00:06:58,343
If we choose a stable sorting algorithm

150
00:06:58,343 --> 00:07:00,600
then bucket sort will be stable.

151
00:07:00,600 --> 00:07:03,010
If we choose an unstable sorting algorithm

152
00:07:03,010 --> 00:07:04,993
than bucket sort will not be stable.

153
00:07:05,900 --> 00:07:07,970
And this note is important

154
00:07:07,970 --> 00:07:10,970
bucket sort requires
that certain properties

155
00:07:10,970 --> 00:07:13,903
hold for the data in
order to be effective.

156
00:07:15,370 --> 00:07:17,850
Now, let's look at a cousin of bucket sort

157
00:07:17,850 --> 00:07:19,053
called radix sort.

158
00:07:20,440 --> 00:07:23,620
Radix sort is a related
algorithm that works with data

159
00:07:23,620 --> 00:07:25,930
that could be sorted lexicographically.

160
00:07:25,930 --> 00:07:26,830
What does that mean?

161
00:07:26,830 --> 00:07:28,013
Lexicographically?

162
00:07:28,850 --> 00:07:31,040
Well, lexicographic sorting is just like

163
00:07:31,040 --> 00:07:33,340
sorting words in a dictionary.

164
00:07:33,340 --> 00:07:34,660
In a dictionary,

165
00:07:34,660 --> 00:07:36,920
we sort on the first letter

166
00:07:36,920 --> 00:07:40,190
and then on the second letter and so on.

167
00:07:40,190 --> 00:07:43,120
So aardvark comes before Abacus.

168
00:07:43,120 --> 00:07:45,480
That's what we mean by
lexicographic sorting,

169
00:07:45,480 --> 00:07:47,023
it's dictionary sorting.

170
00:07:48,280 --> 00:07:50,850
And here we'll take a
look at sorting numbers

171
00:07:50,850 --> 00:07:52,420
based on their digits

172
00:07:52,420 --> 00:07:55,100
but we're going to work from
the least significant digit

173
00:07:55,100 --> 00:07:57,380
to the most significant digit.

174
00:07:57,380 --> 00:08:00,390
This is called LSD radix sort.

175
00:08:00,390 --> 00:08:03,800
And radix is just another word for base

176
00:08:03,800 --> 00:08:06,090
as in base two for binary

177
00:08:06,090 --> 00:08:08,830
or base 10 in our
everyday counting system.

178
00:08:08,830 --> 00:08:11,130
The radix is the number of digits

179
00:08:11,130 --> 00:08:13,190
that we have in our system.

180
00:08:13,190 --> 00:08:16,690
So for base 10, we have zero through nine

181
00:08:16,690 --> 00:08:21,110
and radix sort complexity
is big O of n times d

182
00:08:21,110 --> 00:08:23,750
where d is the maximum number of digits

183
00:08:23,750 --> 00:08:26,233
among all the numbers we wish to sort.

184
00:08:27,470 --> 00:08:31,920
So radix sort can outperform
big O of n log n algorithms

185
00:08:31,920 --> 00:08:36,633
whenever d, the number of
digits is less than log n.

186
00:08:37,570 --> 00:08:40,933
Now it may surprise you,
but radix sort is old.

187
00:08:42,000 --> 00:08:43,720
Here's a tabulating machine

188
00:08:43,720 --> 00:08:46,460
that was used to sort punched cards

189
00:08:46,460 --> 00:08:51,093
in the 1890 census using
the radix sorting algorithm.

190
00:08:51,960 --> 00:08:54,760
And while it's been around a long time

191
00:08:54,760 --> 00:08:57,250
it wasn't until the mid 1950s

192
00:08:57,250 --> 00:08:59,360
that an efficient implementation

193
00:08:59,360 --> 00:09:01,980
was invented to run on a computer.

194
00:09:01,980 --> 00:09:05,580
So we've known about radix
sort for quite some time

195
00:09:05,580 --> 00:09:08,880
but it's only been within
the last 50 or 60 years

196
00:09:08,880 --> 00:09:12,043
that we've been able to run
it effectively on computers.

197
00:09:14,610 --> 00:09:16,910
Here's pseudocode for radix sort

198
00:09:16,910 --> 00:09:19,720
and note that this is
for lexicographic sorting

199
00:09:19,720 --> 00:09:22,350
of integers by digit.

200
00:09:22,350 --> 00:09:24,600
We take an input vector,

201
00:09:24,600 --> 00:09:28,420
we're working in base 10,
so we make 10 buckets.

202
00:09:28,420 --> 00:09:30,290
And then for each digit

203
00:09:30,290 --> 00:09:34,650
let's say our largest number is 4,212

204
00:09:34,650 --> 00:09:39,350
we have four digits, for
each digit for each element

205
00:09:39,350 --> 00:09:41,870
we distribute into the appropriate bucket

206
00:09:41,870 --> 00:09:43,640
based on the digit.

207
00:09:43,640 --> 00:09:44,920
And then for each bucket

208
00:09:44,920 --> 00:09:47,960
we gather the results back
into the original vector.

209
00:09:47,960 --> 00:09:50,263
And we repeat that for each digit.

210
00:09:51,290 --> 00:09:53,430
Let's see what that means.

211
00:09:53,430 --> 00:09:55,700
Let's say we have this vector

212
00:09:55,700 --> 00:09:58,010
one nine two three seven nine and so on,

213
00:09:58,010 --> 00:10:02,220
notice that I've inserted
a leading zero here

214
00:10:02,220 --> 00:10:04,430
two leading zeros here,

215
00:10:04,430 --> 00:10:06,300
leading zero here

216
00:10:06,300 --> 00:10:08,470
to show that

217
00:10:08,470 --> 00:10:11,030
when we sort on this digit

218
00:10:11,030 --> 00:10:12,570
that this value is zero.

219
00:10:12,570 --> 00:10:15,973
And when we sort on this
digit, this value is zero.

220
00:10:17,160 --> 00:10:19,900
So for our first sort,
remember that we're working

221
00:10:19,900 --> 00:10:23,510
from least significant digit
to most significant digit.

222
00:10:23,510 --> 00:10:27,070
We're gonna sort on these digits in red

223
00:10:27,070 --> 00:10:31,350
two, nine, two, seven,
four, five, four, three.

224
00:10:31,350 --> 00:10:35,120
And if we do that, then
we get this result.

225
00:10:35,120 --> 00:10:36,670
This is our resulting vector.

226
00:10:36,670 --> 00:10:39,320
Notice that the twos begin for us

227
00:10:39,320 --> 00:10:43,223
then the threes, the fours,
the fives, seven and nine.

228
00:10:44,320 --> 00:10:47,480
Then the next sort, we're going to sort

229
00:10:47,480 --> 00:10:49,560
by the second digit

230
00:10:49,560 --> 00:10:52,830
nine, one, zero, zero and so on.

231
00:10:52,830 --> 00:10:56,980
And if we do that, we'll
get this resulting vector,

232
00:10:56,980 --> 00:10:59,630
notice now that the zeros are leading,

233
00:10:59,630 --> 00:11:04,023
followed by the ones, a
five, two sevens and a nine.

234
00:11:05,360 --> 00:11:08,030
Finally, since we have three digits

235
00:11:08,030 --> 00:11:10,320
we sort on that third digit

236
00:11:10,320 --> 00:11:11,500
and now see we have

237
00:11:11,500 --> 00:11:16,500
two, zero, zero, zero,
four, two, three, one.

238
00:11:16,520 --> 00:11:19,140
And if we sort on that third digit

239
00:11:20,450 --> 00:11:21,710
we get this result

240
00:11:22,610 --> 00:11:27,610
zero, zero, zero, one,
two, two, three, four.

241
00:11:28,390 --> 00:11:30,020
And now you'll notice

242
00:11:31,130 --> 00:11:33,720
that our vector is in sorted order.

243
00:11:33,720 --> 00:11:37,403
So we sorted our vector without
performing any comparisons.

244
00:11:39,390 --> 00:11:41,380
Now, how do we get the value of a digit

245
00:11:41,380 --> 00:11:43,800
without performing comparisons?

246
00:11:43,800 --> 00:11:45,740
Well, we take the number

247
00:11:45,740 --> 00:11:49,920
we divide by the appropriate
power of our radix or base,

248
00:11:49,920 --> 00:11:52,870
and then we take that
value modular the radix.

249
00:11:52,870 --> 00:11:54,790
So if we're working in base 10,

250
00:11:54,790 --> 00:11:56,743
as we've been in these examples,

251
00:11:57,590 --> 00:12:01,380
to get the third digit of 2,708

252
00:12:01,380 --> 00:12:04,290
we integer divide by 10 to the two,

253
00:12:04,290 --> 00:12:09,140
notice that we zero index the
digits, which gives us 27.

254
00:12:09,140 --> 00:12:13,750
And then we take 27 modular
10, which is our radix,

255
00:12:13,750 --> 00:12:15,600
and that yields seven.

256
00:12:15,600 --> 00:12:17,940
And so the third digit

257
00:12:17,940 --> 00:12:20,410
of 2,708

258
00:12:20,410 --> 00:12:21,510
is seven.

259
00:12:21,510 --> 00:12:24,800
So we now know that on our third iteration

260
00:12:24,800 --> 00:12:29,160
2,708 will get distributed
to the sevens bucket.

261
00:12:29,160 --> 00:12:32,293
Notice that the buckets
are also zero indexed.

262
00:12:33,160 --> 00:12:35,123
Let's take a look on visual algo.

263
00:12:36,810 --> 00:12:40,673
Notice that we have input
values from one to 9,680.

264
00:12:43,390 --> 00:12:46,290
We create those 10 buckets

265
00:12:47,530 --> 00:12:49,310
and they're using red

266
00:12:49,310 --> 00:12:52,010
to indicate the digit
that we're sorting on,

267
00:12:52,010 --> 00:12:53,883
the least significant digit.

268
00:12:55,039 --> 00:12:57,433
And then we dish them
back out to the vector.

269
00:13:01,110 --> 00:13:03,290
Then we do that again for the second digit

270
00:13:13,346 --> 00:13:16,800
and notice that we take the
elements out of the buckets

271
00:13:16,800 --> 00:13:18,623
in the order that they went in.

272
00:13:21,060 --> 00:13:23,000
Now we do it for the third digit

273
00:13:32,314 --> 00:13:34,000
and dish them back out to the vector,

274
00:13:34,000 --> 00:13:36,510
and then we'll make one more
pass for the fourth digit,

275
00:13:36,510 --> 00:13:37,610
and we'll be complete.

276
00:13:54,520 --> 00:13:57,233
We skip the empty buckets. That's simple,

277
00:13:59,600 --> 00:14:00,550
and now we're done.

278
00:14:05,070 --> 00:14:07,660
Now we return to our comparison chart

279
00:14:07,660 --> 00:14:10,040
and we've added radix sort.

280
00:14:10,040 --> 00:14:14,150
We see that the time complexity
is big O of n times d

281
00:14:14,150 --> 00:14:18,470
where d is the number of digits
that we have to work with.

282
00:14:18,470 --> 00:14:21,680
And n is the number of
elements in our vector.

283
00:14:21,680 --> 00:14:23,310
Space complexity

284
00:14:23,310 --> 00:14:25,220
is O of n plus d

285
00:14:26,170 --> 00:14:28,310
radix sort is stable

286
00:14:28,310 --> 00:14:30,350
but it also requires that the data

287
00:14:30,350 --> 00:14:32,410
can be sorted lexicographically.

288
00:14:32,410 --> 00:14:34,763
It won't work with any other kind of data.

289
00:14:36,700 --> 00:14:40,170
So in summary, bucket sort and radix sort,

290
00:14:40,170 --> 00:14:43,400
perform sorting without
making comparisons.

291
00:14:43,400 --> 00:14:45,193
They distribute and collect.

292
00:14:46,070 --> 00:14:49,380
They can outperform big
O of n log n algorithms,

293
00:14:49,380 --> 00:14:51,630
but only under certain circumstances

294
00:14:51,630 --> 00:14:54,170
the data have to have certain properties,

295
00:14:54,170 --> 00:14:56,320
both in terms of the data type

296
00:14:56,320 --> 00:14:58,033
and the distribution of the data.

297
00:14:59,080 --> 00:15:02,260
In the next video, we'll
implement a version

298
00:15:02,260 --> 00:15:04,700
of radix sort in C plus plus.

299
00:15:04,700 --> 00:15:05,653
That's all for now.