1
00:00:01,260 --> 00:00:02,370
- [Instructor] Now we're gonna continue

2
00:00:02,370 --> 00:00:03,930
our study of hash tables

3
00:00:03,930 --> 00:00:07,400
and look at another
collision resolution policy

4
00:00:07,400 --> 00:00:08,933
called linear probing.

5
00:00:10,250 --> 00:00:13,660
Earlier, we saw our first
collision resolution policy,

6
00:00:13,660 --> 00:00:15,490
separate chaining.

7
00:00:15,490 --> 00:00:18,030
Now linear probing is another approach

8
00:00:18,030 --> 00:00:19,913
to resolving hash collisions.

9
00:00:20,780 --> 00:00:22,630
Unlike separate chaining,

10
00:00:22,630 --> 00:00:25,800
however, we only allow a single object

11
00:00:25,800 --> 00:00:28,173
at a given index in our hash table.

12
00:00:29,080 --> 00:00:30,363
So how does it work?

13
00:00:31,570 --> 00:00:34,980
Well, the idea behind
linear probing is simple.

14
00:00:34,980 --> 00:00:36,830
If a collision occurs,

15
00:00:36,830 --> 00:00:40,450
we probe our hash table
taking one step at a time

16
00:00:40,450 --> 00:00:42,470
until we find an empty spot

17
00:00:42,470 --> 00:00:44,453
for the object we wish to insert.

18
00:00:45,320 --> 00:00:47,600
We call the step size the stride

19
00:00:47,600 --> 00:00:49,750
and it's typically set to one,

20
00:00:49,750 --> 00:00:52,930
but basically linear probing
is just what you might think

21
00:00:52,930 --> 00:00:54,050
given its name.

22
00:00:54,050 --> 00:00:55,160
On collisions,

23
00:00:55,160 --> 00:00:58,520
we probe in a linear step-wise fashion

24
00:00:58,520 --> 00:01:01,210
until we find a home for our new object.

25
00:01:01,210 --> 00:01:02,923
Let's look at a simple example.

26
00:01:03,890 --> 00:01:07,070
Here's an example similar
to the one we saw earlier.

27
00:01:07,070 --> 00:01:10,020
Again, we have a hash table of size seven.

28
00:01:10,020 --> 00:01:13,090
But in this case, unlike
separate chaining,

29
00:01:13,090 --> 00:01:16,753
each element in the hash table
can hold only one object.

30
00:01:18,220 --> 00:01:19,130
As before,

31
00:01:19,130 --> 00:01:21,980
we're gonna use the simplest
possible hash function

32
00:01:21,980 --> 00:01:24,080
for demonstration purposes,

33
00:01:24,080 --> 00:01:26,343
f(x) equals x mod seven.

34
00:01:27,270 --> 00:01:29,850
So let's insert 12,

35
00:01:29,850 --> 00:01:32,950
12 mod seven equals five.

36
00:01:32,950 --> 00:01:34,950
So 12 goes there.

37
00:01:34,950 --> 00:01:35,973
So far so good.

38
00:01:37,430 --> 00:01:39,420
Now we'll insert nine,

39
00:01:39,420 --> 00:01:41,223
nine mod seven is two.

40
00:01:42,060 --> 00:01:44,023
So nine is gonna go here.

41
00:01:45,330 --> 00:01:48,320
But what happens when
we want to insert 44?

42
00:01:48,320 --> 00:01:51,030
44 mod seven is also two.

43
00:01:51,030 --> 00:01:52,363
So we have a collision.

44
00:01:53,980 --> 00:01:55,550
Where does 44 go?

45
00:01:55,550 --> 00:01:57,800
Well, we start probing at two.

46
00:01:57,800 --> 00:01:59,870
We found that there's a collision.

47
00:01:59,870 --> 00:02:03,470
So we take one step to
the right, to three,

48
00:02:03,470 --> 00:02:07,323
and we insert 44 in the
first open position we find.

49
00:02:08,200 --> 00:02:09,700
That's linear probing.

50
00:02:09,700 --> 00:02:10,653
Let's continue.

51
00:02:11,860 --> 00:02:13,600
If we want to insert 30,

52
00:02:13,600 --> 00:02:15,750
30 mod seven is also two.

53
00:02:15,750 --> 00:02:17,950
So we have a collision again.

54
00:02:17,950 --> 00:02:19,070
What do we do?

55
00:02:19,070 --> 00:02:22,000
Well, we start our probe sequence at two,

56
00:02:22,000 --> 00:02:24,600
and we continue until we
find an empty position

57
00:02:24,600 --> 00:02:26,590
in which to insert our object.

58
00:02:26,590 --> 00:02:28,663
So 30 goes at index four.

59
00:02:30,670 --> 00:02:32,200
Now we want to insert two.

60
00:02:32,200 --> 00:02:34,470
Again, two mod seven is two,

61
00:02:34,470 --> 00:02:36,240
another collision.

62
00:02:36,240 --> 00:02:37,690
So what do we do?

63
00:02:37,690 --> 00:02:40,530
We probe until we find an empty spot.

64
00:02:40,530 --> 00:02:42,353
Now two goes at index six.

65
00:02:44,360 --> 00:02:46,600
Now we want to insert 16.

66
00:02:46,600 --> 00:02:48,100
Again, we have a collision

67
00:02:48,100 --> 00:02:50,160
because 16 mod seven is two.

68
00:02:50,160 --> 00:02:50,993
Now what do we do?

69
00:02:50,993 --> 00:02:54,110
We've run out of positions in our index.

70
00:02:54,110 --> 00:02:55,440
Do you think?

71
00:02:55,440 --> 00:02:57,240
No, I think we wrap around.

72
00:02:57,240 --> 00:02:58,623
So watch what happens.

73
00:02:59,540 --> 00:03:01,100
We get to the end of the vector

74
00:03:01,100 --> 00:03:02,850
and we wrap around to the beginning.

75
00:03:02,850 --> 00:03:06,623
So 16 is inserted at position zero.

76
00:03:08,430 --> 00:03:10,530
Now this sequence of steps that we take

77
00:03:10,530 --> 00:03:12,303
is called a probe sequence.

78
00:03:13,260 --> 00:03:14,460
So in the case of nine,

79
00:03:14,460 --> 00:03:16,223
our probe sequence was just two.

80
00:03:17,060 --> 00:03:20,610
In the case of 44, we had
to probe two and three.

81
00:03:20,610 --> 00:03:23,720
For 30, we had to probe
two, three, and four.

82
00:03:23,720 --> 00:03:27,860
For two, we had to probe two,
three, four, five, and six.

83
00:03:27,860 --> 00:03:32,160
And for 16, we probed all the
way back around to zero again.

84
00:03:32,160 --> 00:03:36,140
So now what happens when
we want to find an element?

85
00:03:36,140 --> 00:03:40,060
Well, we just follow
the same probe sequence

86
00:03:40,060 --> 00:03:41,680
until we find our target

87
00:03:41,680 --> 00:03:44,930
or we encounter an empty
position in our hash table,

88
00:03:44,930 --> 00:03:46,560
at which point we'll know that the target

89
00:03:46,560 --> 00:03:48,253
doesn't exist in the hash table.

90
00:03:50,060 --> 00:03:52,600
So for example, to find 30,

91
00:03:52,600 --> 00:03:54,670
we calculate the hash in two.

92
00:03:54,670 --> 00:03:56,820
We start at index two,

93
00:03:56,820 --> 00:03:58,310
we find that it's not there,

94
00:03:58,310 --> 00:04:00,563
and so we probe until we find 30.

95
00:04:01,520 --> 00:04:05,860
Now 72 is not in this hash table.

96
00:04:05,860 --> 00:04:07,650
So what happens then?

97
00:04:07,650 --> 00:04:09,090
Well, we calculate the hash,

98
00:04:09,090 --> 00:04:10,100
again, we get two,

99
00:04:10,100 --> 00:04:11,570
we start at two,

100
00:04:11,570 --> 00:04:14,040
and we continue our linear probe

101
00:04:14,040 --> 00:04:17,100
until we arrive at an empty position.

102
00:04:17,100 --> 00:04:18,770
Once we hit that empty position,

103
00:04:18,770 --> 00:04:22,070
then we know that 72 is
not in the hash table.

104
00:04:22,070 --> 00:04:23,783
So that's how find works.

105
00:04:25,480 --> 00:04:26,990
Now what about removing?

106
00:04:26,990 --> 00:04:30,200
Well, let's say we want to remove 16.

107
00:04:30,200 --> 00:04:31,200
Calculate that hash,

108
00:04:31,200 --> 00:04:32,100
we start at two,

109
00:04:32,100 --> 00:04:35,940
we probe in a linear
fashion until we get to 16,

110
00:04:35,940 --> 00:04:38,833
we match that value and then we delete it.

111
00:04:39,980 --> 00:04:41,053
So far so good.

112
00:04:42,290 --> 00:04:45,400
Now what happens if we delete 30?

113
00:04:45,400 --> 00:04:47,830
Well, we start at two,

114
00:04:47,830 --> 00:04:48,970
we take two steps,

115
00:04:48,970 --> 00:04:51,703
we find 30 and we delete it.

116
00:04:52,980 --> 00:04:54,950
But now we got a problem.

117
00:04:54,950 --> 00:04:56,433
Can you see what it is?

118
00:04:57,530 --> 00:05:00,760
What happens if we try to find two now?

119
00:05:00,760 --> 00:05:02,420
If we try to find two,

120
00:05:02,420 --> 00:05:04,140
we're gonna start at index two

121
00:05:04,140 --> 00:05:07,960
because that's the value
returned by our hash function.

122
00:05:07,960 --> 00:05:12,960
And then we probe twice and we
arrive at an empty position.

123
00:05:12,970 --> 00:05:16,770
And so now we think
two's not in our table.

124
00:05:16,770 --> 00:05:21,130
But it is in fact in our
table at index six here.

125
00:05:21,130 --> 00:05:24,080
And by deleting 30,

126
00:05:24,080 --> 00:05:26,970
we broke our probe sequence,

127
00:05:26,970 --> 00:05:27,970
and that's not good.

128
00:05:29,090 --> 00:05:33,290
So what we do in cases of deleting

129
00:05:33,290 --> 00:05:36,230
is rather than just remove the item

130
00:05:36,230 --> 00:05:39,943
and return the position to an empty state,

131
00:05:41,070 --> 00:05:43,240
we have another state in our table

132
00:05:43,240 --> 00:05:47,433
that indicates an item has
been deleted or removed.

133
00:05:48,400 --> 00:05:52,233
And in this case, it
preserves our probe sequence.

134
00:05:53,210 --> 00:05:55,860
So now when we want to find two,

135
00:05:55,860 --> 00:05:58,370
we don't stop at four
thinking we haven't found it,

136
00:05:58,370 --> 00:06:02,300
we can continue on our
way until we find two.

137
00:06:02,300 --> 00:06:05,270
So you'll see more about how this works

138
00:06:05,270 --> 00:06:08,010
when we implement this in C++.

139
00:06:08,010 --> 00:06:11,170
But be aware that we need to ensure

140
00:06:11,170 --> 00:06:13,433
that our probe sequence isn't broken.

141
00:06:16,860 --> 00:06:19,690
Linear probing is prone to a phenomenon

142
00:06:19,690 --> 00:06:22,080
called primary clustering.

143
00:06:22,080 --> 00:06:25,610
And that's because every time
there's a hash collision,

144
00:06:25,610 --> 00:06:27,370
we just look in the next position,

145
00:06:27,370 --> 00:06:29,570
and then the next, and then the next.

146
00:06:29,570 --> 00:06:33,060
And we tend to fill up blocks
of contiguous positions

147
00:06:33,060 --> 00:06:34,610
in our hash table.

148
00:06:34,610 --> 00:06:36,053
That's primary clustering.

149
00:06:37,810 --> 00:06:41,180
The downside, when we
have primary clustering,

150
00:06:41,180 --> 00:06:43,760
is we have to keep probing again and again

151
00:06:43,760 --> 00:06:47,430
and again and again until
we find an empty spot.

152
00:06:47,430 --> 00:06:50,900
So that's the problem that
we'll address a little later.

153
00:06:50,900 --> 00:06:54,103
But you should be aware that
it occurs with linear probing.

154
00:06:55,260 --> 00:06:58,460
So let's compare linear
probing with separate chaining.

155
00:06:58,460 --> 00:07:02,630
In the case of linear probing,
on collisions, we probe.

156
00:07:02,630 --> 00:07:03,760
With separate chaining,

157
00:07:03,760 --> 00:07:05,920
on collisions, we just extend the chain.

158
00:07:05,920 --> 00:07:07,573
We add one more to the bucket.

159
00:07:08,840 --> 00:07:10,520
In the case of linear probing,

160
00:07:10,520 --> 00:07:12,080
there's a fixed upper limit

161
00:07:12,080 --> 00:07:14,620
on the number of objects we can insert.

162
00:07:14,620 --> 00:07:17,240
That's the size of our hash table.

163
00:07:17,240 --> 00:07:18,770
We don't have that restriction

164
00:07:18,770 --> 00:07:20,970
in the case of separate chaining,

165
00:07:20,970 --> 00:07:23,880
because we can extend the chains.

166
00:07:23,880 --> 00:07:27,400
Linear probing again is
prone to primary clustering

167
00:07:27,400 --> 00:07:28,880
but with separate chaining,

168
00:07:28,880 --> 00:07:30,340
clustering does not occur

169
00:07:30,340 --> 00:07:32,090
since we're not doing that probing.

170
00:07:34,090 --> 00:07:34,923
In summary,

171
00:07:34,923 --> 00:07:37,000
with a linear probing scheme,

172
00:07:37,000 --> 00:07:40,233
we only allow a single
object at a given index.

173
00:07:41,090 --> 00:07:42,210
Upon hash collisions,

174
00:07:42,210 --> 00:07:45,170
we probe our hash table one step at a time

175
00:07:45,170 --> 00:07:46,940
until we find an empty position

176
00:07:46,940 --> 00:07:48,340
where we can put our object.

177
00:07:49,730 --> 00:07:51,050
And unlike separate chaining

178
00:07:51,050 --> 00:07:53,180
where we can extend our chains,

179
00:07:53,180 --> 00:07:55,360
linear probing has a fixed upper limit

180
00:07:55,360 --> 00:07:56,410
on the number of objects

181
00:07:56,410 --> 00:07:58,373
we can insert into our hash table.

182
00:07:59,610 --> 00:08:00,530
So before we finish,

183
00:08:00,530 --> 00:08:02,880
I'm gonna leave you with
a couple of questions.

184
00:08:03,750 --> 00:08:05,950
Because we have this upper limit,

185
00:08:05,950 --> 00:08:07,380
the size of our hash table,

186
00:08:07,380 --> 00:08:11,280
what can we do when we run out
of space in our hash table?

187
00:08:11,280 --> 00:08:12,883
We'll see about that shortly.

188
00:08:14,190 --> 00:08:16,350
And then another question.

189
00:08:16,350 --> 00:08:19,480
If we set our stride to
some value greater than one,

190
00:08:19,480 --> 00:08:21,390
say two or three,

191
00:08:21,390 --> 00:08:24,710
why is it a good idea to
have a hash table size

192
00:08:24,710 --> 00:08:26,123
that's a prime number?

193
00:08:27,260 --> 00:08:29,320
Give that some thought, too.

194
00:08:29,320 --> 00:08:30,870
I'll see you in the next video.