1 00:00:01,090 --> 00:00:04,410 - [Instructor] Okay, so we're continuing with hash tables. 2 00:00:04,410 --> 00:00:07,920 And now we're gonna look at what's called quadratic probing. 3 00:00:07,920 --> 00:00:10,430 I remember the first time I heard about quadratic probing, 4 00:00:10,430 --> 00:00:12,720 it sounded so exotic. 5 00:00:12,720 --> 00:00:13,630 So scary. 6 00:00:13,630 --> 00:00:14,980 What is quadratic probing? 7 00:00:14,980 --> 00:00:17,750 Oh my gosh. Quadratic probing. 8 00:00:17,750 --> 00:00:19,380 Anyhow, it's no big deal. 9 00:00:19,380 --> 00:00:20,850 So, we'll see what it is 10 00:00:20,850 --> 00:00:23,600 and it's actually very easy to implement. 11 00:00:23,600 --> 00:00:27,930 So when it comes time for us to modify our hash table 12 00:00:27,930 --> 00:00:30,880 to implement quadratic probing, 13 00:00:30,880 --> 00:00:33,213 we'll only have to change a couple of lines of code. 14 00:00:34,560 --> 00:00:37,410 So far we've seen two collision resolution policies; 15 00:00:37,410 --> 00:00:39,730 separate chaining and linear probing. 16 00:00:39,730 --> 00:00:41,840 And quadratic probing is just another approach 17 00:00:41,840 --> 00:00:43,800 to resolving hash collisions. 18 00:00:43,800 --> 00:00:46,360 But of course, with any approach, 19 00:00:46,360 --> 00:00:48,680 there's a problem to be solved. 20 00:00:48,680 --> 00:00:49,990 And what is the problem? 21 00:00:49,990 --> 00:00:52,850 Well, we've seen that linear probing is prone 22 00:00:52,850 --> 00:00:55,290 to what's called primary clustering 23 00:00:55,290 --> 00:00:57,480 and quadratic probing is designed 24 00:00:57,480 --> 00:01:00,220 to eliminate primary cluster. 25 00:01:00,220 --> 00:01:03,083 Let's remind ourselves what primary clustering is. 26 00:01:05,100 --> 00:01:07,200 You'll recall that in the case of linear probing, 27 00:01:07,200 --> 00:01:09,450 we just probe from one element to the next, 28 00:01:09,450 --> 00:01:10,560 with a fixed stride. 29 00:01:10,560 --> 00:01:14,653 When we hit occupied positions in our hash table. 30 00:01:17,530 --> 00:01:20,620 Say we insert an object at position zero, 31 00:01:20,620 --> 00:01:22,520 the hash function returns zero 32 00:01:22,520 --> 00:01:25,110 and it tells us to put it there, so we do. 33 00:01:25,110 --> 00:01:28,330 Now the next time our hash function returns zero, 34 00:01:28,330 --> 00:01:29,963 we're gonna insert at one. 35 00:01:32,510 --> 00:01:34,003 That's our linear probing. 36 00:01:34,920 --> 00:01:39,730 But now what happens is that the next time 37 00:01:39,730 --> 00:01:42,620 our hash function returns a zero or a one, 38 00:01:42,620 --> 00:01:44,593 we're gonna insert at two. 39 00:01:46,400 --> 00:01:47,233 Okay. 40 00:01:48,130 --> 00:01:52,750 And now, if our hash function returns zero one or two, 41 00:01:52,750 --> 00:01:54,520 we're gonna wind up at three. 42 00:01:54,520 --> 00:01:56,740 Any of those values will get us to three, 43 00:01:56,740 --> 00:01:59,680 you can see how this primary cluster builds. 44 00:01:59,680 --> 00:02:02,410 Now, any hash value, zero, one, 45 00:02:02,410 --> 00:02:05,720 two, or three, is gonna land us at four. 46 00:02:05,720 --> 00:02:10,300 Now you can enter the probe sequence at any point. 47 00:02:10,300 --> 00:02:12,880 If we insert now with any value, one through five, 48 00:02:12,880 --> 00:02:16,460 we'll wind up at six, and therein lies the problem. 49 00:02:16,460 --> 00:02:19,870 If we hit any one of these elements 50 00:02:19,870 --> 00:02:21,810 within our primary cluster, 51 00:02:21,810 --> 00:02:25,070 we're gonna have to enter that same chain 52 00:02:25,070 --> 00:02:28,110 and we're gonna wind up at the next empty position. 53 00:02:28,110 --> 00:02:29,743 Okay, and we wanna avoid that. 54 00:02:30,910 --> 00:02:33,340 So, whereas with linear probing, 55 00:02:33,340 --> 00:02:38,340 we have this fixed stride of one typically, 56 00:02:38,390 --> 00:02:41,053 wherever we just step from position to position. 57 00:02:42,220 --> 00:02:46,363 With quadratic probing, our stride varies at each step. 58 00:02:47,560 --> 00:02:49,220 So what's goin' on here? 59 00:02:49,220 --> 00:02:51,770 We've got a little step and then a little bigger step 60 00:02:51,770 --> 00:02:54,100 and then a little bigger step. 61 00:02:54,100 --> 00:02:55,870 And it's very simple. 62 00:02:55,870 --> 00:02:58,360 We just keep track of the number of steps that we're taking. 63 00:02:58,360 --> 00:03:02,960 So on the first, we take one squared that gets us one. 64 00:03:02,960 --> 00:03:04,980 So our stride is one. 65 00:03:04,980 --> 00:03:08,170 On our second step, we take two squared, 66 00:03:08,170 --> 00:03:10,810 which is four and our stride is four. 67 00:03:10,810 --> 00:03:13,410 One, two, three, four. 68 00:03:13,410 --> 00:03:16,890 On our third step, we take three squared, that gets us nine. 69 00:03:16,890 --> 00:03:18,150 So our stride is nine. 70 00:03:18,150 --> 00:03:22,430 One, two, three, four, five, 71 00:03:22,430 --> 00:03:25,270 six, seven, eight, nine. 72 00:03:25,270 --> 00:03:26,770 Our next one would be 16. 73 00:03:26,770 --> 00:03:27,603 You know, what would happen? 74 00:03:27,603 --> 00:03:29,700 We'd wrap around and it turns out 75 00:03:29,700 --> 00:03:32,150 we'd wind up at a position 13, 76 00:03:32,150 --> 00:03:35,480 but that's the idea behind quadratic probing. 77 00:03:35,480 --> 00:03:38,680 So it distributes the probe sequence 78 00:03:38,680 --> 00:03:40,023 throughout the hash table. 79 00:03:41,830 --> 00:03:45,080 So again, to go back to our primary clustering, 80 00:03:45,080 --> 00:03:48,520 this is our primary cluster with linear probing. 81 00:03:48,520 --> 00:03:50,640 And again, we can enter at any point 82 00:03:50,640 --> 00:03:54,290 and we're gonna pick up somewhere along this chain 83 00:03:54,290 --> 00:03:57,140 and we're gonna wind up at the next open position. 84 00:03:57,140 --> 00:03:59,840 In the case of linear probing and primary clustering, 85 00:03:59,840 --> 00:04:03,400 we follow the same probe sequence or some portion thereof, 86 00:04:03,400 --> 00:04:05,853 no matter where we enter the primary cluster. 87 00:04:07,000 --> 00:04:09,820 And in the case of quadratic probing, 88 00:04:09,820 --> 00:04:14,820 a secondary cluster, which is these yellow elements 89 00:04:15,340 --> 00:04:17,600 is actually distributed throughout the table. 90 00:04:17,600 --> 00:04:20,590 And we only follow that same probe sequence 91 00:04:20,590 --> 00:04:24,100 if we start at the same initial point. 92 00:04:24,100 --> 00:04:29,100 So if we were to get a hash value of five, 93 00:04:29,400 --> 00:04:32,340 we wouldn't follow this same probing sequence. 94 00:04:32,340 --> 00:04:36,340 We do six, and then one, two, three, four, 95 00:04:36,340 --> 00:04:38,010 would get us to 10. 96 00:04:38,010 --> 00:04:41,410 And then one, two, three, four, five, 97 00:04:41,410 --> 00:04:45,290 six, seven, eight, nine, would get us to two. 98 00:04:45,290 --> 00:04:49,083 So you see, it avoids the problem of this primary cluster. 99 00:04:51,140 --> 00:04:54,173 Let's take a look at a CRPs. 100 00:04:54,173 --> 00:04:56,300 CRP, what does that stand for? 101 00:04:56,300 --> 00:04:58,320 That's collision resolution policy. 102 00:04:58,320 --> 00:04:59,310 I had to abbreviate it 103 00:04:59,310 --> 00:05:01,810 'cause otherwise it wasn't gonna fit on the slide. 104 00:05:02,670 --> 00:05:06,920 So we have linear probing and quadratic probing, 105 00:05:06,920 --> 00:05:10,980 in both cases, we probe when we hit a collision. 106 00:05:10,980 --> 00:05:12,870 There's a fixed upper limit on the number 107 00:05:12,870 --> 00:05:14,430 of objects that we can insert. 108 00:05:14,430 --> 00:05:15,840 That's the size of our hash table, 109 00:05:15,840 --> 00:05:18,510 which is different from separate chaining as you recall, 110 00:05:18,510 --> 00:05:19,640 which is only limited by 111 00:05:19,640 --> 00:05:21,890 the memory or the system constraints. 112 00:05:21,890 --> 00:05:26,650 Linear probing has a fixed stride, typically one, 113 00:05:26,650 --> 00:05:30,850 and with quadratic probing, the stride changes on each step. 114 00:05:30,850 --> 00:05:33,343 It's the counter for the step, squared. 115 00:05:34,550 --> 00:05:38,600 And whereas linear probing is prone to primary clustering, 116 00:05:38,600 --> 00:05:42,080 secondary clustering happens with quadratic probing, 117 00:05:42,080 --> 00:05:45,200 but again, those clusters are distributed 118 00:05:45,200 --> 00:05:48,963 throughout the tables, so they become less of a problem. 119 00:05:50,940 --> 00:05:53,330 In summary, like linear probing 120 00:05:53,330 --> 00:05:55,230 and unlike separate chaining, 121 00:05:55,230 --> 00:05:58,293 we only allow a single object at a given index. 122 00:05:59,170 --> 00:06:01,760 Upon hash collisions, we probe our hash table, 123 00:06:01,760 --> 00:06:05,260 one step at a time until we find an empty position, 124 00:06:05,260 --> 00:06:08,010 but our stride changes on each step. 125 00:06:08,010 --> 00:06:10,470 And like linear probing and unlike separate chaining, 126 00:06:10,470 --> 00:06:12,890 quadratic probing also has a fixed limit 127 00:06:12,890 --> 00:06:14,750 on the number of objects that we can insert, 128 00:06:14,750 --> 00:06:17,300 so rehashing is gonna be a thing 129 00:06:17,300 --> 00:06:19,763 when we're doing quadratic probing as well. 130 00:06:20,610 --> 00:06:23,900 We'll implement this in C plus plus in project five, 131 00:06:23,900 --> 00:06:27,580 but I think you'll see that this is a pretty simple concept 132 00:06:27,580 --> 00:06:29,460 and it does a good job of 133 00:06:29,460 --> 00:06:31,720 breaking up those primary clusters. 134 00:06:31,720 --> 00:06:35,460 And in general, keeps the probe sequences shorter 135 00:06:35,460 --> 00:06:37,770 than if we were using a linear probing. 136 00:06:37,770 --> 00:06:40,593 And that's the problem it's intended the solve. 137 00:06:41,440 --> 00:06:43,970 Later on, we'll look at another 138 00:06:43,970 --> 00:06:47,880 collision resolution policy called double hashing, 139 00:06:47,880 --> 00:06:51,560 which is designed to avoid the secondary cluster. 140 00:06:51,560 --> 00:06:54,360 But we'll see that in another video, that's all for now.