1 00:00:00,870 --> 00:00:03,063 - [Instructor] Bucket Sort and Radix Sort. 2 00:00:05,130 --> 00:00:06,650 Bucket sort and radix sort 3 00:00:06,650 --> 00:00:08,860 work by distributing and collecting 4 00:00:08,860 --> 00:00:10,910 the elements to be sorted. 5 00:00:10,910 --> 00:00:13,530 They don't perform comparisons. 6 00:00:13,530 --> 00:00:15,260 This is a different approach 7 00:00:15,260 --> 00:00:17,233 from any that we've seen so far. 8 00:00:18,360 --> 00:00:20,590 All the algorithms we've seen so far 9 00:00:20,590 --> 00:00:23,520 use comparison to perform the sort 10 00:00:23,520 --> 00:00:25,730 and these algorithms can't do any better 11 00:00:25,730 --> 00:00:27,283 than big O of n log n. 12 00:00:28,800 --> 00:00:32,660 But bucket sort and radix sort don't perform comparisons. 13 00:00:32,660 --> 00:00:34,820 They're sorting algorithms that work best when 14 00:00:34,820 --> 00:00:38,400 we have more or less uniformly distributed data. 15 00:00:38,400 --> 00:00:40,340 And we'll see that bucket sort works best 16 00:00:40,340 --> 00:00:43,523 when we have a limited range of possible values. 17 00:00:46,240 --> 00:00:48,300 Here's pseudocode for bucket sort 18 00:00:48,300 --> 00:00:50,630 adapted from the textbook. 19 00:00:50,630 --> 00:00:52,700 We take an input vector. 20 00:00:52,700 --> 00:00:54,260 We make some buckets 21 00:00:54,260 --> 00:00:57,470 to places where we're going to hold elements. 22 00:00:57,470 --> 00:01:00,000 We distribute the elements into the buckets 23 00:01:00,850 --> 00:01:03,653 then we sort the items within each bucket. 24 00:01:04,500 --> 00:01:06,660 And then ,we gather the results back 25 00:01:06,660 --> 00:01:08,123 into the original vector. 26 00:01:08,980 --> 00:01:11,880 Needless to say, this pseudocode here 27 00:01:11,880 --> 00:01:14,053 sweeps a lot of detail under the rug. 28 00:01:15,650 --> 00:01:17,660 Let's look at this in a little more detail. 29 00:01:17,660 --> 00:01:19,940 This again is adapted from the textbook 30 00:01:19,940 --> 00:01:21,633 figure six dash 12. 31 00:01:23,440 --> 00:01:26,880 We start with an unsorted vector of values. 32 00:01:26,880 --> 00:01:28,920 Notice that we have 10 values 33 00:01:28,920 --> 00:01:31,830 and their range is relatively restricted. 34 00:01:31,830 --> 00:01:36,710 All of these values fall within the range zero to 100. 35 00:01:36,710 --> 00:01:39,003 We don't have 10 million, for example. 36 00:01:39,910 --> 00:01:41,710 First we distribute the elements 37 00:01:41,710 --> 00:01:44,090 into the appropriate buckets. 38 00:01:44,090 --> 00:01:48,910 This first bucket here, holds values from zero to 19. 39 00:01:48,910 --> 00:01:52,690 The second holds values from 20 to 39. 40 00:01:52,690 --> 00:01:56,220 The third holds values from 40 to 59, 41 00:01:56,220 --> 00:01:59,080 the fourth 60 to 79 42 00:01:59,080 --> 00:02:01,940 and last 80 to 99. 43 00:02:01,940 --> 00:02:03,500 Then once we have all the values 44 00:02:03,500 --> 00:02:05,280 in their respective buckets, 45 00:02:05,280 --> 00:02:08,043 we sort the values within the buckets. 46 00:02:10,130 --> 00:02:13,080 And from there, it's a simple matter to gather the results 47 00:02:13,080 --> 00:02:14,940 and return them to the original vector 48 00:02:14,940 --> 00:02:17,150 now in sorted order. 49 00:02:17,150 --> 00:02:19,440 And again, there's a lot of detail 50 00:02:19,440 --> 00:02:21,033 being swept under the rug here. 51 00:02:22,380 --> 00:02:26,370 For example, how do we decide how many buckets to use 52 00:02:28,210 --> 00:02:29,790 and how do we sort the elements 53 00:02:29,790 --> 00:02:32,193 once they've been distributed into the buckets? 54 00:02:33,260 --> 00:02:35,120 Let's set aside for the time being 55 00:02:35,120 --> 00:02:37,010 the question of how many buckets to use 56 00:02:37,010 --> 00:02:38,793 and focus on this second question. 57 00:02:39,670 --> 00:02:42,780 The answer is any number of ways. 58 00:02:42,780 --> 00:02:45,950 We could recursively call bucket sort on each bucket 59 00:02:45,950 --> 00:02:48,980 as we've seen with the divide and conquer approach 60 00:02:48,980 --> 00:02:51,100 or we could use another sort algorithm 61 00:02:51,100 --> 00:02:55,550 like insertion sort to sort the elements within each bucket. 62 00:02:55,550 --> 00:02:57,640 And there are times when you'd like to implement 63 00:02:57,640 --> 00:03:00,330 bucket sort in different ways 64 00:03:00,330 --> 00:03:03,230 and different sorting algorithms might be appropriate 65 00:03:03,230 --> 00:03:04,143 at this level. 66 00:03:06,270 --> 00:03:09,310 Now to the question of how many buckets to use. 67 00:03:09,310 --> 00:03:12,070 Obviously we don't want just one, that would be no help. 68 00:03:12,070 --> 00:03:13,580 We take the whole input vector, 69 00:03:13,580 --> 00:03:16,930 dump it into a single bucket and nothing would be changed. 70 00:03:16,930 --> 00:03:18,350 And we don't want the same number 71 00:03:18,350 --> 00:03:20,680 of buckets as there are elements in the vector 72 00:03:20,680 --> 00:03:22,410 that too would be no help. 73 00:03:22,410 --> 00:03:24,770 So the answer is, again, it depends 74 00:03:24,770 --> 00:03:26,653 and there's no hard and fast rule. 75 00:03:27,500 --> 00:03:31,130 It depends on the nature and distribution of your data. 76 00:03:31,130 --> 00:03:33,000 What do I mean by that? 77 00:03:33,000 --> 00:03:34,603 Let's look at some examples. 78 00:03:35,480 --> 00:03:39,900 Here we have a vector 3,000, 1,000 zero and 2,000 79 00:03:40,770 --> 00:03:42,770 and we wanna sort this using bucket sort 80 00:03:43,680 --> 00:03:47,290 but notice that the values are pretty well spread out. 81 00:03:47,290 --> 00:03:49,550 They range from 3000 to zero. 82 00:03:49,550 --> 00:03:51,480 They are only four values. 83 00:03:51,480 --> 00:03:53,070 So that's kind of sparse. 84 00:03:53,070 --> 00:03:54,653 We'd call that sparse. 85 00:03:55,510 --> 00:03:56,800 And if your data are sparse 86 00:03:56,800 --> 00:03:59,160 you don't wanna have too many buckets. 87 00:03:59,160 --> 00:04:01,960 Let's say we had this vector as data. 88 00:04:01,960 --> 00:04:03,450 And if we want to sort these 89 00:04:03,450 --> 00:04:06,800 it wouldn't do as much good to have 600 buckets. 90 00:04:06,800 --> 00:04:08,360 We'd wanna have something say 91 00:04:08,360 --> 00:04:10,193 between three and five buckets. 92 00:04:11,330 --> 00:04:13,180 Now look at this vector. 93 00:04:13,180 --> 00:04:14,650 This is very different. 94 00:04:14,650 --> 00:04:16,400 What if we had data like this, 95 00:04:16,400 --> 00:04:19,600 densely clustered around a single value. 96 00:04:19,600 --> 00:04:23,560 Notice that all the values are pretty close to 2000. 97 00:04:23,560 --> 00:04:27,570 There's a big gap from zero to 1998. 98 00:04:27,570 --> 00:04:30,600 There's nothing beyond 2010 99 00:04:30,600 --> 00:04:35,233 and everything is closely clustered around that value 2000. 100 00:04:37,070 --> 00:04:39,990 So what happens if we have data like this? 101 00:04:39,990 --> 00:04:41,950 If we make our buckets too big 102 00:04:41,950 --> 00:04:45,260 then all these values might wind up in the same bucket. 103 00:04:45,260 --> 00:04:47,970 And as you might expect, worst case performance 104 00:04:47,970 --> 00:04:50,210 for bucket sort occurs when all the values 105 00:04:50,210 --> 00:04:51,793 fall into a single bucket. 106 00:04:53,530 --> 00:04:55,130 Which brings us to the question 107 00:04:55,130 --> 00:04:57,380 of Bucket Sort Complexity. 108 00:04:57,380 --> 00:05:00,770 Assuming data are reasonably uniformly distributed, 109 00:05:00,770 --> 00:05:04,500 assuming that they are, and that you have n elements 110 00:05:04,500 --> 00:05:06,170 and m buckets, 111 00:05:06,170 --> 00:05:08,310 distributing takes n steps 112 00:05:08,310 --> 00:05:09,290 because we have n elements 113 00:05:09,290 --> 00:05:11,773 we have to dish them out into so many buckets. 114 00:05:12,610 --> 00:05:16,660 Sorting each bucket takes f of n divided by n steps 115 00:05:16,660 --> 00:05:18,330 where f is the runtime 116 00:05:18,330 --> 00:05:22,000 of the function used to sort the buckets. 117 00:05:22,000 --> 00:05:23,200 So you recall, I said 118 00:05:23,200 --> 00:05:26,330 we could use a bucket sort recursively 119 00:05:26,330 --> 00:05:29,080 or we could use insertion sort. 120 00:05:29,080 --> 00:05:33,333 This step depends on the sorting algorithm that we choose. 121 00:05:34,410 --> 00:05:37,700 But with m buckets, we multiply and we get 122 00:05:37,700 --> 00:05:39,910 m times f of n divided by n 123 00:05:41,560 --> 00:05:43,400 and gathering takes n steps. 124 00:05:43,400 --> 00:05:46,750 We have n elements, it takes n steps to gather them. 125 00:05:46,750 --> 00:05:48,850 So we wind up with big O of n 126 00:05:49,700 --> 00:05:53,810 plus big O of m times f of n divided by n 127 00:05:53,810 --> 00:05:54,690 plus O of n 128 00:05:56,140 --> 00:05:56,973 and 129 00:05:58,030 --> 00:05:59,590 this, 130 00:05:59,590 --> 00:06:01,850 n divided by n, that's just the number 131 00:06:01,850 --> 00:06:04,000 of elements divided by the number of buckets. 132 00:06:04,000 --> 00:06:05,960 That's a constant. 133 00:06:05,960 --> 00:06:10,130 And we have two O of n terms that gets us O of two n 134 00:06:10,130 --> 00:06:12,960 and we just ignore the leading coefficient. 135 00:06:12,960 --> 00:06:17,143 And so this whole thing simplifies to big O of n plus n. 136 00:06:18,240 --> 00:06:19,490 And notice that this means 137 00:06:19,490 --> 00:06:22,633 that performance depends on the number of buckets. 138 00:06:23,710 --> 00:06:25,780 Also remember that this only holds true 139 00:06:25,780 --> 00:06:28,410 if the assumption we gave before is true, 140 00:06:28,410 --> 00:06:31,343 that data are reasonably uniformly distributed. 141 00:06:32,420 --> 00:06:35,003 Now let's add bucket sort to our comparison chart. 142 00:06:35,940 --> 00:06:37,440 Here we have bucket sort. 143 00:06:37,440 --> 00:06:41,790 We see that it has big O of n plus m time complexity 144 00:06:41,790 --> 00:06:46,210 big O of n plus m space complexity. 145 00:06:46,210 --> 00:06:49,470 And it depends, whether it's stable or not. 146 00:06:49,470 --> 00:06:51,070 What does it depend on? 147 00:06:51,070 --> 00:06:53,020 It depends on the sorting algorithm 148 00:06:53,020 --> 00:06:56,290 that's used to sort the elements within each bucket. 149 00:06:56,290 --> 00:06:58,343 If we choose a stable sorting algorithm 150 00:06:58,343 --> 00:07:00,600 then bucket sort will be stable. 151 00:07:00,600 --> 00:07:03,010 If we choose an unstable sorting algorithm 152 00:07:03,010 --> 00:07:04,993 than bucket sort will not be stable. 153 00:07:05,900 --> 00:07:07,970 And this note is important 154 00:07:07,970 --> 00:07:10,970 bucket sort requires that certain properties 155 00:07:10,970 --> 00:07:13,903 hold for the data in order to be effective. 156 00:07:15,370 --> 00:07:17,850 Now, let's look at a cousin of bucket sort 157 00:07:17,850 --> 00:07:19,053 called radix sort. 158 00:07:20,440 --> 00:07:23,620 Radix sort is a related algorithm that works with data 159 00:07:23,620 --> 00:07:25,930 that could be sorted lexicographically. 160 00:07:25,930 --> 00:07:26,830 What does that mean? 161 00:07:26,830 --> 00:07:28,013 Lexicographically? 162 00:07:28,850 --> 00:07:31,040 Well, lexicographic sorting is just like 163 00:07:31,040 --> 00:07:33,340 sorting words in a dictionary. 164 00:07:33,340 --> 00:07:34,660 In a dictionary, 165 00:07:34,660 --> 00:07:36,920 we sort on the first letter 166 00:07:36,920 --> 00:07:40,190 and then on the second letter and so on. 167 00:07:40,190 --> 00:07:43,120 So aardvark comes before Abacus. 168 00:07:43,120 --> 00:07:45,480 That's what we mean by lexicographic sorting, 169 00:07:45,480 --> 00:07:47,023 it's dictionary sorting. 170 00:07:48,280 --> 00:07:50,850 And here we'll take a look at sorting numbers 171 00:07:50,850 --> 00:07:52,420 based on their digits 172 00:07:52,420 --> 00:07:55,100 but we're going to work from the least significant digit 173 00:07:55,100 --> 00:07:57,380 to the most significant digit. 174 00:07:57,380 --> 00:08:00,390 This is called LSD radix sort. 175 00:08:00,390 --> 00:08:03,800 And radix is just another word for base 176 00:08:03,800 --> 00:08:06,090 as in base two for binary 177 00:08:06,090 --> 00:08:08,830 or base 10 in our everyday counting system. 178 00:08:08,830 --> 00:08:11,130 The radix is the number of digits 179 00:08:11,130 --> 00:08:13,190 that we have in our system. 180 00:08:13,190 --> 00:08:16,690 So for base 10, we have zero through nine 181 00:08:16,690 --> 00:08:21,110 and radix sort complexity is big O of n times d 182 00:08:21,110 --> 00:08:23,750 where d is the maximum number of digits 183 00:08:23,750 --> 00:08:26,233 among all the numbers we wish to sort. 184 00:08:27,470 --> 00:08:31,920 So radix sort can outperform big O of n log n algorithms 185 00:08:31,920 --> 00:08:36,633 whenever d, the number of digits is less than log n. 186 00:08:37,570 --> 00:08:40,933 Now it may surprise you, but radix sort is old. 187 00:08:42,000 --> 00:08:43,720 Here's a tabulating machine 188 00:08:43,720 --> 00:08:46,460 that was used to sort punched cards 189 00:08:46,460 --> 00:08:51,093 in the 1890 census using the radix sorting algorithm. 190 00:08:51,960 --> 00:08:54,760 And while it's been around a long time 191 00:08:54,760 --> 00:08:57,250 it wasn't until the mid 1950s 192 00:08:57,250 --> 00:08:59,360 that an efficient implementation 193 00:08:59,360 --> 00:09:01,980 was invented to run on a computer. 194 00:09:01,980 --> 00:09:05,580 So we've known about radix sort for quite some time 195 00:09:05,580 --> 00:09:08,880 but it's only been within the last 50 or 60 years 196 00:09:08,880 --> 00:09:12,043 that we've been able to run it effectively on computers. 197 00:09:14,610 --> 00:09:16,910 Here's pseudocode for radix sort 198 00:09:16,910 --> 00:09:19,720 and note that this is for lexicographic sorting 199 00:09:19,720 --> 00:09:22,350 of integers by digit. 200 00:09:22,350 --> 00:09:24,600 We take an input vector, 201 00:09:24,600 --> 00:09:28,420 we're working in base 10, so we make 10 buckets. 202 00:09:28,420 --> 00:09:30,290 And then for each digit 203 00:09:30,290 --> 00:09:34,650 let's say our largest number is 4,212 204 00:09:34,650 --> 00:09:39,350 we have four digits, for each digit for each element 205 00:09:39,350 --> 00:09:41,870 we distribute into the appropriate bucket 206 00:09:41,870 --> 00:09:43,640 based on the digit. 207 00:09:43,640 --> 00:09:44,920 And then for each bucket 208 00:09:44,920 --> 00:09:47,960 we gather the results back into the original vector. 209 00:09:47,960 --> 00:09:50,263 And we repeat that for each digit. 210 00:09:51,290 --> 00:09:53,430 Let's see what that means. 211 00:09:53,430 --> 00:09:55,700 Let's say we have this vector 212 00:09:55,700 --> 00:09:58,010 one nine two three seven nine and so on, 213 00:09:58,010 --> 00:10:02,220 notice that I've inserted a leading zero here 214 00:10:02,220 --> 00:10:04,430 two leading zeros here, 215 00:10:04,430 --> 00:10:06,300 leading zero here 216 00:10:06,300 --> 00:10:08,470 to show that 217 00:10:08,470 --> 00:10:11,030 when we sort on this digit 218 00:10:11,030 --> 00:10:12,570 that this value is zero. 219 00:10:12,570 --> 00:10:15,973 And when we sort on this digit, this value is zero. 220 00:10:17,160 --> 00:10:19,900 So for our first sort, remember that we're working 221 00:10:19,900 --> 00:10:23,510 from least significant digit to most significant digit. 222 00:10:23,510 --> 00:10:27,070 We're gonna sort on these digits in red 223 00:10:27,070 --> 00:10:31,350 two, nine, two, seven, four, five, four, three. 224 00:10:31,350 --> 00:10:35,120 And if we do that, then we get this result. 225 00:10:35,120 --> 00:10:36,670 This is our resulting vector. 226 00:10:36,670 --> 00:10:39,320 Notice that the twos begin for us 227 00:10:39,320 --> 00:10:43,223 then the threes, the fours, the fives, seven and nine. 228 00:10:44,320 --> 00:10:47,480 Then the next sort, we're going to sort 229 00:10:47,480 --> 00:10:49,560 by the second digit 230 00:10:49,560 --> 00:10:52,830 nine, one, zero, zero and so on. 231 00:10:52,830 --> 00:10:56,980 And if we do that, we'll get this resulting vector, 232 00:10:56,980 --> 00:10:59,630 notice now that the zeros are leading, 233 00:10:59,630 --> 00:11:04,023 followed by the ones, a five, two sevens and a nine. 234 00:11:05,360 --> 00:11:08,030 Finally, since we have three digits 235 00:11:08,030 --> 00:11:10,320 we sort on that third digit 236 00:11:10,320 --> 00:11:11,500 and now see we have 237 00:11:11,500 --> 00:11:16,500 two, zero, zero, zero, four, two, three, one. 238 00:11:16,520 --> 00:11:19,140 And if we sort on that third digit 239 00:11:20,450 --> 00:11:21,710 we get this result 240 00:11:22,610 --> 00:11:27,610 zero, zero, zero, one, two, two, three, four. 241 00:11:28,390 --> 00:11:30,020 And now you'll notice 242 00:11:31,130 --> 00:11:33,720 that our vector is in sorted order. 243 00:11:33,720 --> 00:11:37,403 So we sorted our vector without performing any comparisons. 244 00:11:39,390 --> 00:11:41,380 Now, how do we get the value of a digit 245 00:11:41,380 --> 00:11:43,800 without performing comparisons? 246 00:11:43,800 --> 00:11:45,740 Well, we take the number 247 00:11:45,740 --> 00:11:49,920 we divide by the appropriate power of our radix or base, 248 00:11:49,920 --> 00:11:52,870 and then we take that value modular the radix. 249 00:11:52,870 --> 00:11:54,790 So if we're working in base 10, 250 00:11:54,790 --> 00:11:56,743 as we've been in these examples, 251 00:11:57,590 --> 00:12:01,380 to get the third digit of 2,708 252 00:12:01,380 --> 00:12:04,290 we integer divide by 10 to the two, 253 00:12:04,290 --> 00:12:09,140 notice that we zero index the digits, which gives us 27. 254 00:12:09,140 --> 00:12:13,750 And then we take 27 modular 10, which is our radix, 255 00:12:13,750 --> 00:12:15,600 and that yields seven. 256 00:12:15,600 --> 00:12:17,940 And so the third digit 257 00:12:17,940 --> 00:12:20,410 of 2,708 258 00:12:20,410 --> 00:12:21,510 is seven. 259 00:12:21,510 --> 00:12:24,800 So we now know that on our third iteration 260 00:12:24,800 --> 00:12:29,160 2,708 will get distributed to the sevens bucket. 261 00:12:29,160 --> 00:12:32,293 Notice that the buckets are also zero indexed. 262 00:12:33,160 --> 00:12:35,123 Let's take a look on visual algo. 263 00:12:36,810 --> 00:12:40,673 Notice that we have input values from one to 9,680. 264 00:12:43,390 --> 00:12:46,290 We create those 10 buckets 265 00:12:47,530 --> 00:12:49,310 and they're using red 266 00:12:49,310 --> 00:12:52,010 to indicate the digit that we're sorting on, 267 00:12:52,010 --> 00:12:53,883 the least significant digit. 268 00:12:55,039 --> 00:12:57,433 And then we dish them back out to the vector. 269 00:13:01,110 --> 00:13:03,290 Then we do that again for the second digit 270 00:13:13,346 --> 00:13:16,800 and notice that we take the elements out of the buckets 271 00:13:16,800 --> 00:13:18,623 in the order that they went in. 272 00:13:21,060 --> 00:13:23,000 Now we do it for the third digit 273 00:13:32,314 --> 00:13:34,000 and dish them back out to the vector, 274 00:13:34,000 --> 00:13:36,510 and then we'll make one more pass for the fourth digit, 275 00:13:36,510 --> 00:13:37,610 and we'll be complete. 276 00:13:54,520 --> 00:13:57,233 We skip the empty buckets. That's simple, 277 00:13:59,600 --> 00:14:00,550 and now we're done. 278 00:14:05,070 --> 00:14:07,660 Now we return to our comparison chart 279 00:14:07,660 --> 00:14:10,040 and we've added radix sort. 280 00:14:10,040 --> 00:14:14,150 We see that the time complexity is big O of n times d 281 00:14:14,150 --> 00:14:18,470 where d is the number of digits that we have to work with. 282 00:14:18,470 --> 00:14:21,680 And n is the number of elements in our vector. 283 00:14:21,680 --> 00:14:23,310 Space complexity 284 00:14:23,310 --> 00:14:25,220 is O of n plus d 285 00:14:26,170 --> 00:14:28,310 radix sort is stable 286 00:14:28,310 --> 00:14:30,350 but it also requires that the data 287 00:14:30,350 --> 00:14:32,410 can be sorted lexicographically. 288 00:14:32,410 --> 00:14:34,763 It won't work with any other kind of data. 289 00:14:36,700 --> 00:14:40,170 So in summary, bucket sort and radix sort, 290 00:14:40,170 --> 00:14:43,400 perform sorting without making comparisons. 291 00:14:43,400 --> 00:14:45,193 They distribute and collect. 292 00:14:46,070 --> 00:14:49,380 They can outperform big O of n log n algorithms, 293 00:14:49,380 --> 00:14:51,630 but only under certain circumstances 294 00:14:51,630 --> 00:14:54,170 the data have to have certain properties, 295 00:14:54,170 --> 00:14:56,320 both in terms of the data type 296 00:14:56,320 --> 00:14:58,033 and the distribution of the data. 297 00:14:59,080 --> 00:15:02,260 In the next video, we'll implement a version 298 00:15:02,260 --> 00:15:04,700 of radix sort in C plus plus. 299 00:15:04,700 --> 00:15:05,653 That's all for now.