1
00:00:01,090 --> 00:00:04,410
- [Instructor] Okay, so we're
continuing with hash tables.

2
00:00:04,410 --> 00:00:07,920
And now we're gonna look at
what's called quadratic probing.

3
00:00:07,920 --> 00:00:10,430
I remember the first time I
heard about quadratic probing,

4
00:00:10,430 --> 00:00:12,720
it sounded so exotic.

5
00:00:12,720 --> 00:00:13,630
So scary.

6
00:00:13,630 --> 00:00:14,980
What is quadratic probing?

7
00:00:14,980 --> 00:00:17,750
Oh my gosh. Quadratic probing.

8
00:00:17,750 --> 00:00:19,380
Anyhow, it's no big deal.

9
00:00:19,380 --> 00:00:20,850
So, we'll see what it is

10
00:00:20,850 --> 00:00:23,600
and it's actually very easy to implement.

11
00:00:23,600 --> 00:00:27,930
So when it comes time for
us to modify our hash table

12
00:00:27,930 --> 00:00:30,880
to implement quadratic probing,

13
00:00:30,880 --> 00:00:33,213
we'll only have to change
a couple of lines of code.

14
00:00:34,560 --> 00:00:37,410
So far we've seen two
collision resolution policies;

15
00:00:37,410 --> 00:00:39,730
separate chaining and linear probing.

16
00:00:39,730 --> 00:00:41,840
And quadratic probing
is just another approach

17
00:00:41,840 --> 00:00:43,800
to resolving hash collisions.

18
00:00:43,800 --> 00:00:46,360
But of course, with any approach,

19
00:00:46,360 --> 00:00:48,680
there's a problem to be solved.

20
00:00:48,680 --> 00:00:49,990
And what is the problem?

21
00:00:49,990 --> 00:00:52,850
Well, we've seen that
linear probing is prone

22
00:00:52,850 --> 00:00:55,290
to what's called primary clustering

23
00:00:55,290 --> 00:00:57,480
and quadratic probing is designed

24
00:00:57,480 --> 00:01:00,220
to eliminate primary cluster.

25
00:01:00,220 --> 00:01:03,083
Let's remind ourselves
what primary clustering is.

26
00:01:05,100 --> 00:01:07,200
You'll recall that in the
case of linear probing,

27
00:01:07,200 --> 00:01:09,450
we just probe from one
element to the next,

28
00:01:09,450 --> 00:01:10,560
with a fixed stride.

29
00:01:10,560 --> 00:01:14,653
When we hit occupied
positions in our hash table.

30
00:01:17,530 --> 00:01:20,620
Say we insert an object at position zero,

31
00:01:20,620 --> 00:01:22,520
the hash function returns zero

32
00:01:22,520 --> 00:01:25,110
and it tells us to put it there, so we do.

33
00:01:25,110 --> 00:01:28,330
Now the next time our hash
function returns zero,

34
00:01:28,330 --> 00:01:29,963
we're gonna insert at one.

35
00:01:32,510 --> 00:01:34,003
That's our linear probing.

36
00:01:34,920 --> 00:01:39,730
But now what happens is that the next time

37
00:01:39,730 --> 00:01:42,620
our hash function returns a zero or a one,

38
00:01:42,620 --> 00:01:44,593
we're gonna insert at two.

39
00:01:46,400 --> 00:01:47,233
Okay.

40
00:01:48,130 --> 00:01:52,750
And now, if our hash function
returns zero one or two,

41
00:01:52,750 --> 00:01:54,520
we're gonna wind up at three.

42
00:01:54,520 --> 00:01:56,740
Any of those values will get us to three,

43
00:01:56,740 --> 00:01:59,680
you can see how this
primary cluster builds.

44
00:01:59,680 --> 00:02:02,410
Now, any hash value, zero, one,

45
00:02:02,410 --> 00:02:05,720
two, or three, is gonna land us at four.

46
00:02:05,720 --> 00:02:10,300
Now you can enter the probe
sequence at any point.

47
00:02:10,300 --> 00:02:12,880
If we insert now with any
value, one through five,

48
00:02:12,880 --> 00:02:16,460
we'll wind up at six, and
therein lies the problem.

49
00:02:16,460 --> 00:02:19,870
If we hit any one of these elements

50
00:02:19,870 --> 00:02:21,810
within our primary cluster,

51
00:02:21,810 --> 00:02:25,070
we're gonna have to enter that same chain

52
00:02:25,070 --> 00:02:28,110
and we're gonna wind up at
the next empty position.

53
00:02:28,110 --> 00:02:29,743
Okay, and we wanna avoid that.

54
00:02:30,910 --> 00:02:33,340
So, whereas with linear probing,

55
00:02:33,340 --> 00:02:38,340
we have this fixed
stride of one typically,

56
00:02:38,390 --> 00:02:41,053
wherever we just step
from position to position.

57
00:02:42,220 --> 00:02:46,363
With quadratic probing, our
stride varies at each step.

58
00:02:47,560 --> 00:02:49,220
So what's goin' on here?

59
00:02:49,220 --> 00:02:51,770
We've got a little step and
then a little bigger step

60
00:02:51,770 --> 00:02:54,100
and then a little bigger step.

61
00:02:54,100 --> 00:02:55,870
And it's very simple.

62
00:02:55,870 --> 00:02:58,360
We just keep track of the number
of steps that we're taking.

63
00:02:58,360 --> 00:03:02,960
So on the first, we take one
squared that gets us one.

64
00:03:02,960 --> 00:03:04,980
So our stride is one.

65
00:03:04,980 --> 00:03:08,170
On our second step, we take two squared,

66
00:03:08,170 --> 00:03:10,810
which is four and our stride is four.

67
00:03:10,810 --> 00:03:13,410
One, two, three, four.

68
00:03:13,410 --> 00:03:16,890
On our third step, we take three
squared, that gets us nine.

69
00:03:16,890 --> 00:03:18,150
So our stride is nine.

70
00:03:18,150 --> 00:03:22,430
One, two, three, four, five,

71
00:03:22,430 --> 00:03:25,270
six, seven, eight, nine.

72
00:03:25,270 --> 00:03:26,770
Our next one would be 16.

73
00:03:26,770 --> 00:03:27,603
You know, what would happen?

74
00:03:27,603 --> 00:03:29,700
We'd wrap around and it turns out

75
00:03:29,700 --> 00:03:32,150
we'd wind up at a position 13,

76
00:03:32,150 --> 00:03:35,480
but that's the idea
behind quadratic probing.

77
00:03:35,480 --> 00:03:38,680
So it distributes the probe sequence

78
00:03:38,680 --> 00:03:40,023
throughout the hash table.

79
00:03:41,830 --> 00:03:45,080
So again, to go back to
our primary clustering,

80
00:03:45,080 --> 00:03:48,520
this is our primary cluster
with linear probing.

81
00:03:48,520 --> 00:03:50,640
And again, we can enter at any point

82
00:03:50,640 --> 00:03:54,290
and we're gonna pick up
somewhere along this chain

83
00:03:54,290 --> 00:03:57,140
and we're gonna wind up
at the next open position.

84
00:03:57,140 --> 00:03:59,840
In the case of linear probing
and primary clustering,

85
00:03:59,840 --> 00:04:03,400
we follow the same probe
sequence or some portion thereof,

86
00:04:03,400 --> 00:04:05,853
no matter where we enter
the primary cluster.

87
00:04:07,000 --> 00:04:09,820
And in the case of quadratic probing,

88
00:04:09,820 --> 00:04:14,820
a secondary cluster, which
is these yellow elements

89
00:04:15,340 --> 00:04:17,600
is actually distributed
throughout the table.

90
00:04:17,600 --> 00:04:20,590
And we only follow that
same probe sequence

91
00:04:20,590 --> 00:04:24,100
if we start at the same initial point.

92
00:04:24,100 --> 00:04:29,100
So if we were to get a hash value of five,

93
00:04:29,400 --> 00:04:32,340
we wouldn't follow this
same probing sequence.

94
00:04:32,340 --> 00:04:36,340
We do six, and then one, two, three, four,

95
00:04:36,340 --> 00:04:38,010
would get us to 10.

96
00:04:38,010 --> 00:04:41,410
And then one, two, three, four, five,

97
00:04:41,410 --> 00:04:45,290
six, seven, eight, nine,
would get us to two.

98
00:04:45,290 --> 00:04:49,083
So you see, it avoids the
problem of this primary cluster.

99
00:04:51,140 --> 00:04:54,173
Let's take a look at a CRPs.

100
00:04:54,173 --> 00:04:56,300
CRP, what does that stand for?

101
00:04:56,300 --> 00:04:58,320
That's collision resolution policy.

102
00:04:58,320 --> 00:04:59,310
I had to abbreviate it

103
00:04:59,310 --> 00:05:01,810
'cause otherwise it wasn't
gonna fit on the slide.

104
00:05:02,670 --> 00:05:06,920
So we have linear probing
and quadratic probing,

105
00:05:06,920 --> 00:05:10,980
in both cases, we probe
when we hit a collision.

106
00:05:10,980 --> 00:05:12,870
There's a fixed upper limit on the number

107
00:05:12,870 --> 00:05:14,430
of objects that we can insert.

108
00:05:14,430 --> 00:05:15,840
That's the size of our hash table,

109
00:05:15,840 --> 00:05:18,510
which is different from
separate chaining as you recall,

110
00:05:18,510 --> 00:05:19,640
which is only limited by

111
00:05:19,640 --> 00:05:21,890
the memory or the system constraints.

112
00:05:21,890 --> 00:05:26,650
Linear probing has a fixed
stride, typically one,

113
00:05:26,650 --> 00:05:30,850
and with quadratic probing, the
stride changes on each step.

114
00:05:30,850 --> 00:05:33,343
It's the counter for the step, squared.

115
00:05:34,550 --> 00:05:38,600
And whereas linear probing is
prone to primary clustering,

116
00:05:38,600 --> 00:05:42,080
secondary clustering happens
with quadratic probing,

117
00:05:42,080 --> 00:05:45,200
but again, those clusters are distributed

118
00:05:45,200 --> 00:05:48,963
throughout the tables, so
they become less of a problem.

119
00:05:50,940 --> 00:05:53,330
In summary, like linear probing

120
00:05:53,330 --> 00:05:55,230
and unlike separate chaining,

121
00:05:55,230 --> 00:05:58,293
we only allow a single
object at a given index.

122
00:05:59,170 --> 00:06:01,760
Upon hash collisions,
we probe our hash table,

123
00:06:01,760 --> 00:06:05,260
one step at a time until
we find an empty position,

124
00:06:05,260 --> 00:06:08,010
but our stride changes on each step.

125
00:06:08,010 --> 00:06:10,470
And like linear probing and
unlike separate chaining,

126
00:06:10,470 --> 00:06:12,890
quadratic probing also has a fixed limit

127
00:06:12,890 --> 00:06:14,750
on the number of objects
that we can insert,

128
00:06:14,750 --> 00:06:17,300
so rehashing is gonna be a thing

129
00:06:17,300 --> 00:06:19,763
when we're doing
quadratic probing as well.

130
00:06:20,610 --> 00:06:23,900
We'll implement this in C
plus plus in project five,

131
00:06:23,900 --> 00:06:27,580
but I think you'll see that
this is a pretty simple concept

132
00:06:27,580 --> 00:06:29,460
and it does a good job of

133
00:06:29,460 --> 00:06:31,720
breaking up those primary clusters.

134
00:06:31,720 --> 00:06:35,460
And in general, keeps the
probe sequences shorter

135
00:06:35,460 --> 00:06:37,770
than if we were using a linear probing.

136
00:06:37,770 --> 00:06:40,593
And that's the problem
it's intended the solve.

137
00:06:41,440 --> 00:06:43,970
Later on, we'll look at another

138
00:06:43,970 --> 00:06:47,880
collision resolution policy
called double hashing,

139
00:06:47,880 --> 00:06:51,560
which is designed to avoid
the secondary cluster.

140
00:06:51,560 --> 00:06:54,360
But we'll see that in another
video, that's all for now.