1 00:00:01,080 --> 00:00:03,480 - [Instructor] The dynamic equivalence, problem 2 00:00:03,480 --> 00:00:05,203 and disjoint sets. 3 00:00:07,040 --> 00:00:11,260 In an earlier video, we saw the dynamic equivalence problem. 4 00:00:11,260 --> 00:00:13,310 I'll restate it here. 5 00:00:13,310 --> 00:00:15,410 Say we have a collection of objects, 6 00:00:15,410 --> 00:00:16,970 we'd like to be able to determine 7 00:00:16,970 --> 00:00:18,630 if two objects are related 8 00:00:18,630 --> 00:00:21,900 to one another by some equivalence relation 9 00:00:21,900 --> 00:00:24,030 and we'd like to be able to connect these objects 10 00:00:24,030 --> 00:00:25,940 by introducing and equivalence 11 00:00:26,780 --> 00:00:29,730 and we'd like to be able to perform both of these operations 12 00:00:29,730 --> 00:00:32,670 as quickly and efficiently as possible. 13 00:00:32,670 --> 00:00:34,943 That's the dynamic equivalence problem. 14 00:00:37,000 --> 00:00:38,937 The two operations that we're speaking of 15 00:00:38,937 --> 00:00:41,100 are find and union. 16 00:00:41,100 --> 00:00:43,630 The fine method determines whether or not 17 00:00:43,630 --> 00:00:45,430 two elements are equivalent. 18 00:00:45,430 --> 00:00:49,520 That is, are they members of the same equivalence class 19 00:00:49,520 --> 00:00:51,640 or do they reside in the same set 20 00:00:51,640 --> 00:00:54,043 which is disjoint from all other sets. 21 00:00:55,370 --> 00:00:57,040 The union method, 22 00:00:57,040 --> 00:01:00,010 takes the union of two equivalents classes 23 00:01:00,010 --> 00:01:01,350 effectively merging them 24 00:01:03,970 --> 00:01:06,970 and as we alluded to in the earlier video, 25 00:01:06,970 --> 00:01:09,600 when we're considering the design for this problem 26 00:01:09,600 --> 00:01:12,193 we need to choose an appropriate data structure. 27 00:01:13,320 --> 00:01:17,100 So here we're using an array as a data structure 28 00:01:17,100 --> 00:01:19,320 that represents this problem instance 29 00:01:19,320 --> 00:01:21,850 as a collection of disjoint sets. 30 00:01:21,850 --> 00:01:23,380 How is that? 31 00:01:23,380 --> 00:01:25,860 Well, each element in the vector 32 00:01:27,110 --> 00:01:31,070 indicates the representative of some set. 33 00:01:31,070 --> 00:01:32,210 So we read this, 34 00:01:32,210 --> 00:01:34,720 this is a index zero. 35 00:01:34,720 --> 00:01:36,670 This value is six. 36 00:01:36,670 --> 00:01:41,490 This means that element zero is in the set represented 37 00:01:41,490 --> 00:01:42,870 by the value six. 38 00:01:42,870 --> 00:01:46,100 So we see here's the representative value six, 39 00:01:46,100 --> 00:01:48,800 zero and six are together in the same set 40 00:01:48,800 --> 00:01:51,950 and notice that six is also represented by six. 41 00:01:51,950 --> 00:01:56,950 We've chosen one single representative of each disjoint set. 42 00:01:57,320 --> 00:02:00,690 So again, as we saw in the earlier video, 43 00:02:00,690 --> 00:02:04,220 we can optimize this problem for the find operation 44 00:02:04,220 --> 00:02:06,090 and we can have constant time 45 00:02:06,090 --> 00:02:08,880 that's big of O of one for the fine method 46 00:02:10,060 --> 00:02:13,780 or we can optimize this problem for union 47 00:02:13,780 --> 00:02:16,190 and have a constant time union method 48 00:02:17,780 --> 00:02:19,910 but we can't have both, 49 00:02:19,910 --> 00:02:21,430 like I said in the last video 50 00:02:21,430 --> 00:02:24,110 you can't have your cake and eat it too. 51 00:02:24,110 --> 00:02:25,463 So there's a trade-off, 52 00:02:26,600 --> 00:02:30,260 but it turns out that with a few clever tricks, 53 00:02:30,260 --> 00:02:32,460 we can get pretty close. 54 00:02:32,460 --> 00:02:36,090 So let's start cracking this problem 55 00:02:36,090 --> 00:02:38,850 by taking a look at what it would entail 56 00:02:38,850 --> 00:02:40,723 to optimize for find. 57 00:02:43,240 --> 00:02:46,770 Optimizing for find is a piece of cake. 58 00:02:46,770 --> 00:02:48,920 If we have this data structure 59 00:02:48,920 --> 00:02:50,680 where we choose one representative 60 00:02:50,680 --> 00:02:52,260 for each equivalence class 61 00:02:52,260 --> 00:02:55,360 and label all the elements belonging to the same class 62 00:02:55,360 --> 00:02:57,850 with the value for that representative, 63 00:02:57,850 --> 00:02:59,940 then all we need to do is compare values 64 00:02:59,940 --> 00:03:01,883 and we can do that in constant time. 65 00:03:02,850 --> 00:03:06,550 So for example, are the elements zero and five 66 00:03:06,550 --> 00:03:08,870 in the same equivalence class? 67 00:03:08,870 --> 00:03:10,940 We check the values of those indices. 68 00:03:10,940 --> 00:03:12,550 If the values are the same 69 00:03:12,550 --> 00:03:14,990 then these elements are in the same equivalence class. 70 00:03:14,990 --> 00:03:17,040 If not, then they aren't. 71 00:03:17,040 --> 00:03:21,070 So here, we see the zero and five 72 00:03:21,070 --> 00:03:23,870 are not in the same equivalence class. 73 00:03:23,870 --> 00:03:27,730 Zero is in the equivalence class represented by six 74 00:03:27,730 --> 00:03:31,263 and five is in the equivalence class represented by five. 75 00:03:32,240 --> 00:03:35,670 Now getting a value from a vector takes constant time. 76 00:03:35,670 --> 00:03:38,610 Comparing two values takes constant time. 77 00:03:38,610 --> 00:03:42,453 So this operation can be performed in constant time. 78 00:03:43,470 --> 00:03:44,860 Here's another example, 79 00:03:44,860 --> 00:03:47,800 are two and eight in the same equivalence class? 80 00:03:47,800 --> 00:03:49,950 Well, yeah, we look at the index two, 81 00:03:49,950 --> 00:03:51,940 we see a value three. 82 00:03:51,940 --> 00:03:54,330 We know that two in the class 83 00:03:54,330 --> 00:03:56,730 represented by the value three, 84 00:03:56,730 --> 00:04:00,610 eight also has a three here. 85 00:04:00,610 --> 00:04:03,960 We can see up above that eight and three are connected. 86 00:04:03,960 --> 00:04:07,690 They belong in the same equivalence class. 87 00:04:07,690 --> 00:04:09,770 So this is gonna return true. 88 00:04:09,770 --> 00:04:13,510 Again we're just looking up two values in a vector 89 00:04:13,510 --> 00:04:15,570 that takes place and constant time. 90 00:04:15,570 --> 00:04:17,620 The comparison takes place in constant time. 91 00:04:17,620 --> 00:04:19,570 The whole thing's done in constant time 92 00:04:20,720 --> 00:04:23,020 and this is gonna come as great shock, 93 00:04:23,020 --> 00:04:25,220 here's the code in C++. 94 00:04:25,220 --> 00:04:28,940 I'm not giving all of the surrounding code 95 00:04:28,940 --> 00:04:30,400 but imagine we have some class 96 00:04:30,400 --> 00:04:34,540 and we have some private member field, 97 00:04:34,540 --> 00:04:37,880 a vector of ints called elements 98 00:04:37,880 --> 00:04:40,000 and then here's this method 99 00:04:40,000 --> 00:04:42,830 that returns a Boolean called find. 100 00:04:42,830 --> 00:04:46,910 We take two ints which are indices. 101 00:04:46,910 --> 00:04:48,640 and we look at the values 102 00:04:48,640 --> 00:04:51,277 that we find at those two indices 103 00:04:51,277 --> 00:04:53,550 in our vector elements. 104 00:04:53,550 --> 00:04:55,450 So if these are equivalent, 105 00:04:55,450 --> 00:04:57,250 this is gonna return true. 106 00:04:57,250 --> 00:04:58,440 If they're not equivalent, 107 00:04:58,440 --> 00:04:59,853 it's gonna return false. 108 00:05:00,690 --> 00:05:02,000 Simple as that. 109 00:05:02,000 --> 00:05:03,800 So that's easy. 110 00:05:03,800 --> 00:05:05,480 So now we have a find operation 111 00:05:05,480 --> 00:05:06,840 that runs in constant time 112 00:05:06,840 --> 00:05:08,570 but what about union? 113 00:05:08,570 --> 00:05:11,743 Well, let's say we wanted to connect four and 10. 114 00:05:12,810 --> 00:05:14,740 We'd have to get representative values 115 00:05:14,740 --> 00:05:16,260 for the set containing four 116 00:05:16,260 --> 00:05:18,530 and the set containing 10. 117 00:05:18,530 --> 00:05:20,470 We'd have to choose a new representative 118 00:05:20,470 --> 00:05:21,930 from between those 119 00:05:21,930 --> 00:05:23,320 which we choose doesn't matter 120 00:05:23,320 --> 00:05:25,440 but we do have to choose 121 00:05:25,440 --> 00:05:28,360 and then we have to iterate through the entire array 122 00:05:28,360 --> 00:05:32,310 and update representatives to reflect the new connection. 123 00:05:32,310 --> 00:05:33,900 So let's walk through this example. 124 00:05:33,900 --> 00:05:35,943 We're gonna connect four and 10. 125 00:05:37,100 --> 00:05:39,050 So here now we have four and 10 126 00:05:39,970 --> 00:05:44,210 and the representative of four is four 127 00:05:44,210 --> 00:05:47,310 and the representative of 10 is 11. 128 00:05:47,310 --> 00:05:51,260 So let's choose four as the representative of the new union 129 00:05:52,190 --> 00:05:55,080 and now we have to go through the whole array 130 00:05:55,080 --> 00:05:57,100 and wherever we find an 11, 131 00:05:57,100 --> 00:05:59,630 we have to change it to a four 132 00:05:59,630 --> 00:06:01,550 and there's no other way to do that 133 00:06:01,550 --> 00:06:03,760 besides iterating through the whole array. 134 00:06:03,760 --> 00:06:05,500 So we gotta start at zero, 135 00:06:05,500 --> 00:06:07,700 is that an 11? No. 136 00:06:07,700 --> 00:06:11,660 Is that in 11? No, no, no, no, no. 137 00:06:11,660 --> 00:06:13,030 Until we get down to here 138 00:06:13,030 --> 00:06:15,415 and then we see we found an 11, 139 00:06:15,415 --> 00:06:17,414 we changed it to a four. 140 00:06:17,414 --> 00:06:18,576 We add another 11, 141 00:06:18,576 --> 00:06:20,340 we change it to a four. 142 00:06:20,340 --> 00:06:22,490 That's the process we have to go through. 143 00:06:22,490 --> 00:06:24,640 There was a fair amount of work to be done here, 144 00:06:24,640 --> 00:06:28,350 just changing the representatives. 145 00:06:28,350 --> 00:06:30,870 So while we have a constant time for find, 146 00:06:30,870 --> 00:06:33,320 we have a linear time for union 147 00:06:33,320 --> 00:06:35,710 because we have to iterate that entire vector 148 00:06:37,250 --> 00:06:40,990 and here's what that method might look like. 149 00:06:40,990 --> 00:06:43,290 Again, I'm not supplying the surrounding code. 150 00:06:43,290 --> 00:06:45,730 Here it is in C++. 151 00:06:45,730 --> 00:06:47,710 We've got this unite method 152 00:06:48,910 --> 00:06:52,010 and if two elements aren't already in the same class 153 00:06:52,010 --> 00:06:53,370 we call find first 154 00:06:53,370 --> 00:06:55,070 then we pick one representative, 155 00:06:55,070 --> 00:06:56,860 we call it the target 156 00:06:56,860 --> 00:06:58,240 and the one that we're gonna change, 157 00:06:58,240 --> 00:07:00,010 we call that change 158 00:07:00,010 --> 00:07:02,810 and then we iterate through the entire vector 159 00:07:04,080 --> 00:07:08,857 checking to see if the element at index i 160 00:07:10,570 --> 00:07:12,660 equals the value that we wanna change 161 00:07:12,660 --> 00:07:13,870 and if it does, 162 00:07:13,870 --> 00:07:16,850 then we set it equal to the target 163 00:07:16,850 --> 00:07:18,114 and then we're done. 164 00:07:18,114 --> 00:07:19,663 We do that whole loop. 165 00:07:20,590 --> 00:07:23,470 So again, if two elements aren't already in the same class 166 00:07:23,470 --> 00:07:24,760 we pick one representative, 167 00:07:24,760 --> 00:07:26,300 iterate through the entire array 168 00:07:26,300 --> 00:07:29,270 and update the representatives as needed. 169 00:07:29,270 --> 00:07:31,080 Now as an aside, 170 00:07:31,080 --> 00:07:33,460 notice that we name our method unite. 171 00:07:33,460 --> 00:07:34,880 This is for two reasons. 172 00:07:34,880 --> 00:07:36,910 First is a minor stylistic thing. 173 00:07:36,910 --> 00:07:41,110 We prefer that method names are verbs or begin with verbs 174 00:07:41,110 --> 00:07:43,090 but in this case more importantly, 175 00:07:43,090 --> 00:07:45,460 union is a C++ keyword. 176 00:07:45,460 --> 00:07:47,733 So we can't have a method with the same name. 177 00:07:48,940 --> 00:07:50,380 So we're gonna perform our unions 178 00:07:50,380 --> 00:07:52,690 by calling the unite method 179 00:07:52,690 --> 00:07:53,820 and that makes good sense. 180 00:07:53,820 --> 00:07:55,120 That's a good word for it, 181 00:07:56,100 --> 00:07:57,923 but we've still got this problem. 182 00:07:58,930 --> 00:08:02,210 We have constant time for find and linear time for union 183 00:08:02,210 --> 00:08:04,780 and what if we wanted to perform a lot of operations 184 00:08:04,780 --> 00:08:08,450 on large data sets say m operations. 185 00:08:08,450 --> 00:08:12,470 Our worst case would be O of n times n 186 00:08:12,470 --> 00:08:15,410 and with m approximately equal to n 187 00:08:15,410 --> 00:08:17,493 that's the same as O of n squared. 188 00:08:18,420 --> 00:08:21,420 So we've got a worst case for m operations 189 00:08:21,420 --> 00:08:24,040 where m is approximately equal to n 190 00:08:24,040 --> 00:08:25,500 of all of n squared 191 00:08:25,500 --> 00:08:28,563 and as we've seen O of n squared is pretty yucky. 192 00:08:29,540 --> 00:08:33,130 It's only suitable for small problem instances. 193 00:08:33,130 --> 00:08:34,500 So think about what happens 194 00:08:34,500 --> 00:08:37,530 if we have a million elements in our vector. 195 00:08:37,530 --> 00:08:40,640 On the laptop I'm using right now to make this presentation 196 00:08:40,640 --> 00:08:43,870 it takes about 3,500 microseconds 197 00:08:43,870 --> 00:08:47,840 to iterate and update a vector of a million elements. 198 00:08:47,840 --> 00:08:50,030 At that speed it would take about an hour 199 00:08:50,030 --> 00:08:52,150 to perform a million unions. 200 00:08:52,150 --> 00:08:53,183 That's no good. 201 00:08:54,230 --> 00:08:56,540 So we're gonna have look at some other approaches 202 00:08:56,540 --> 00:08:58,440 to see if we can do better. 203 00:08:58,440 --> 00:09:01,110 So we've optimized for find 204 00:09:01,110 --> 00:09:02,610 and we got stuck there. 205 00:09:02,610 --> 00:09:05,053 Well what happens if we optimize for union. 206 00:09:06,270 --> 00:09:08,190 Now here same problem instance 207 00:09:08,190 --> 00:09:11,000 except we're starting with all of our elements 208 00:09:11,000 --> 00:09:13,310 in their own singleton sets. 209 00:09:13,310 --> 00:09:14,800 We've got our 12 elements. 210 00:09:14,800 --> 00:09:16,500 Each is in its own disjoint set 211 00:09:17,350 --> 00:09:19,110 and we can see this in the vector. 212 00:09:19,110 --> 00:09:21,960 The value for index zero is zero. 213 00:09:21,960 --> 00:09:23,550 The value for index one is one. 214 00:09:23,550 --> 00:09:26,310 The value for index two is two and so on. 215 00:09:26,310 --> 00:09:27,320 For each element, 216 00:09:27,320 --> 00:09:29,253 the value and the index are the same. 217 00:09:30,970 --> 00:09:34,550 Now let's think of these as trees 218 00:09:34,550 --> 00:09:37,370 and sure it's fine to have a tree of a single node 219 00:09:37,370 --> 00:09:39,923 so each one of these is a tree, 220 00:09:40,830 --> 00:09:42,100 perfectly valid tree, 221 00:09:42,100 --> 00:09:43,693 doesn't have any branches, 222 00:09:45,120 --> 00:09:46,800 it only has a root 223 00:09:46,800 --> 00:09:49,120 but that's okay they're trees 224 00:09:50,010 --> 00:09:52,110 and so we have a forest of 12 trees. 225 00:09:52,110 --> 00:09:53,740 When we have a collection of trees 226 00:09:53,740 --> 00:09:55,020 we call it a forest 227 00:09:55,020 --> 00:09:56,570 that should come as no surprise 228 00:09:57,460 --> 00:10:00,560 but it turns out that looking at the problem this way 229 00:10:00,560 --> 00:10:02,700 makes our union operations easy. 230 00:10:02,700 --> 00:10:04,910 When we look at things as trees, 231 00:10:04,910 --> 00:10:06,250 what do I mean by that? 232 00:10:06,250 --> 00:10:09,110 Given a naive approach, we could perform union 233 00:10:09,110 --> 00:10:12,030 by setting the parent of six here to be zero. 234 00:10:12,030 --> 00:10:14,163 We've got union zero six. 235 00:10:15,000 --> 00:10:18,450 Well, let's take six and point that to zero 236 00:10:18,450 --> 00:10:22,720 and we've got a little tree with one branch and a leaf here 237 00:10:22,720 --> 00:10:24,570 and now we have a disjoint set 238 00:10:24,570 --> 00:10:28,090 consisting of zero and six represented as a tree 239 00:10:28,090 --> 00:10:29,820 with zero at the root. 240 00:10:29,820 --> 00:10:31,133 Okay that works. 241 00:10:33,420 --> 00:10:34,500 Under this arrangement, 242 00:10:34,500 --> 00:10:36,490 how would we perform find? 243 00:10:36,490 --> 00:10:40,100 Well, we'd look to see if two elements share the same root 244 00:10:40,100 --> 00:10:41,020 and if they do 245 00:10:41,020 --> 00:10:43,600 then they're in the same disjoint set. 246 00:10:43,600 --> 00:10:44,433 How do we find the root? 247 00:10:44,433 --> 00:10:46,620 Well we just trace that path back 248 00:10:46,620 --> 00:10:48,560 until we find an element whose index 249 00:10:48,560 --> 00:10:49,790 and the value are the same 250 00:10:49,790 --> 00:10:51,640 and that way we know we found a root. 251 00:10:52,570 --> 00:10:53,930 So in this case, 252 00:10:53,930 --> 00:10:56,593 we've got find eight and six. 253 00:10:57,840 --> 00:10:59,510 We start at eight. 254 00:10:59,510 --> 00:11:03,040 We look at the value, the root of eight is eight. 255 00:11:03,040 --> 00:11:05,423 So that's in its own singleton set. 256 00:11:06,590 --> 00:11:07,610 We look at six, 257 00:11:07,610 --> 00:11:09,740 we see that six is pointing to zero. 258 00:11:09,740 --> 00:11:11,560 We follow that back to zero. 259 00:11:11,560 --> 00:11:12,863 We look at zero, 260 00:11:13,920 --> 00:11:15,960 the index and the value are the same. 261 00:11:15,960 --> 00:11:17,750 So that's a root. 262 00:11:17,750 --> 00:11:20,110 So the root of eight is eight, 263 00:11:20,110 --> 00:11:22,740 the root of six zero 264 00:11:22,740 --> 00:11:25,640 and we know that those aren't in the same set, okay? 265 00:11:25,640 --> 00:11:27,740 They're not in the same equivalence class, 266 00:11:29,880 --> 00:11:31,330 but without modification, 267 00:11:31,330 --> 00:11:33,200 this approach can get us into trouble. 268 00:11:33,200 --> 00:11:36,780 Maybe you can anticipate where we're heading with this. 269 00:11:36,780 --> 00:11:39,960 Let's say we do union six, four, 270 00:11:39,960 --> 00:11:41,540 well, here's six 271 00:11:42,630 --> 00:11:43,740 and there's four, 272 00:11:43,740 --> 00:11:45,720 four sitting in its own singleton set 273 00:11:45,720 --> 00:11:46,553 and what we're gonna do 274 00:11:46,553 --> 00:11:50,223 is we're gonna change this four to point to six. 275 00:11:51,320 --> 00:11:54,050 So we get something that looks like this. 276 00:11:54,050 --> 00:11:56,260 Here is four points to six. 277 00:11:56,260 --> 00:11:59,493 We go to six point to zero, zero, okay? 278 00:12:00,900 --> 00:12:02,850 Fine little tree there. 279 00:12:02,850 --> 00:12:04,970 No problem so far, 280 00:12:04,970 --> 00:12:06,460 but what happens now 281 00:12:07,620 --> 00:12:11,993 if we call union with the parameters four and zero. 282 00:12:13,810 --> 00:12:16,020 So if we're going to perform this operation 283 00:12:16,020 --> 00:12:17,793 union four zero, 284 00:12:18,730 --> 00:12:21,630 under the union operation as we've defined it so far 285 00:12:21,630 --> 00:12:24,080 this would result in pointing zero to four 286 00:12:25,140 --> 00:12:26,390 and now we got a problem. 287 00:12:27,720 --> 00:12:32,070 Zero points to four, four points to six, six points to zero 288 00:12:32,070 --> 00:12:33,420 zero points to four 289 00:12:33,420 --> 00:12:34,330 and we've got a loop, 290 00:12:34,330 --> 00:12:36,110 we've got a cycle. 291 00:12:36,110 --> 00:12:37,883 It's no longer a tree. 292 00:12:38,730 --> 00:12:40,710 We have a cycle with no root 293 00:12:40,710 --> 00:12:43,683 and therefore our find operation can't work. 294 00:12:44,770 --> 00:12:47,060 Now it turns out that there's a simple fix 295 00:12:47,900 --> 00:12:49,790 when performing union 296 00:12:49,790 --> 00:12:53,010 instead of just doing this naive connection. 297 00:12:53,010 --> 00:12:56,160 We first find the root of each of the two elements 298 00:12:56,160 --> 00:12:57,520 to be United 299 00:12:57,520 --> 00:13:01,350 and then point the root of one to the root of the other 300 00:13:01,350 --> 00:13:03,260 and that solves our problem. 301 00:13:03,260 --> 00:13:04,910 To take a step back 302 00:13:04,910 --> 00:13:08,410 if we were to were to perform a union, 303 00:13:08,410 --> 00:13:11,970 we'd find that four's route is zero 304 00:13:11,970 --> 00:13:13,730 by tracing this path back 305 00:13:13,730 --> 00:13:16,230 and zero's route is zero 306 00:13:16,230 --> 00:13:18,510 and we'd update zero to point to zero. 307 00:13:18,510 --> 00:13:20,580 That's no change at all 308 00:13:20,580 --> 00:13:23,580 and so in this way we don't form cycles, 309 00:13:23,580 --> 00:13:25,100 but there's a problem now, 310 00:13:25,100 --> 00:13:27,720 we've traded one problem for another. 311 00:13:27,720 --> 00:13:30,650 The performance of union is now dependent 312 00:13:30,650 --> 00:13:32,580 upon the performance of find 313 00:13:33,620 --> 00:13:34,910 and let's keep that in mind 314 00:13:34,910 --> 00:13:38,360 as we walk through another little demonstration. 315 00:13:38,360 --> 00:13:39,830 So let's start over. 316 00:13:39,830 --> 00:13:44,830 We have all of our elements in Singleton sets. 317 00:13:44,860 --> 00:13:47,690 So we have 12 disjoint sets right now 318 00:13:49,650 --> 00:13:51,790 and we're gonna take union zero and six 319 00:13:51,790 --> 00:13:53,490 like we did before. 320 00:13:53,490 --> 00:13:54,740 The root of zero is zero. 321 00:13:54,740 --> 00:13:56,030 The root of six is six. 322 00:13:56,030 --> 00:13:58,410 So we'll set the root of six to be zero. 323 00:13:58,410 --> 00:13:59,410 There we go. 324 00:13:59,410 --> 00:14:01,110 We're good for one step. 325 00:14:01,110 --> 00:14:03,330 Now union seven, four, 326 00:14:03,330 --> 00:14:04,550 we'll just take four, 327 00:14:04,550 --> 00:14:06,703 we'll point it to seven, all good. 328 00:14:09,070 --> 00:14:13,080 Now union eight, two, okay? 329 00:14:13,080 --> 00:14:15,770 We've got two comes over here. 330 00:14:15,770 --> 00:14:18,160 These are of course all still part of the same vector 331 00:14:18,160 --> 00:14:19,390 we're just moving them around 332 00:14:19,390 --> 00:14:21,650 to show how they look like a tree, 333 00:14:21,650 --> 00:14:24,930 but this is still a linear structure. 334 00:14:24,930 --> 00:14:27,830 So two becomes a child of eight 335 00:14:27,830 --> 00:14:31,290 but now what happens if we union six and two? 336 00:14:31,290 --> 00:14:33,563 Here's six, here's two. 337 00:14:35,770 --> 00:14:39,480 Well, we calculate that the root of six is zero 338 00:14:39,480 --> 00:14:42,380 and the root of two was eight 339 00:14:42,380 --> 00:14:45,820 so we make zero the root of eight, 340 00:14:45,820 --> 00:14:49,210 essentially grafting it onto this tree here. 341 00:14:49,210 --> 00:14:54,210 So now these elements are connected in a tree. 342 00:14:54,370 --> 00:14:58,450 They form one disjoint set or equivalence class 343 00:14:58,450 --> 00:15:00,620 and these other guys are out here 344 00:15:00,620 --> 00:15:03,320 sitting in their equivalence classes, 345 00:15:03,320 --> 00:15:04,283 separate from that. 346 00:15:06,380 --> 00:15:09,130 Now let's do union seven, six. 347 00:15:09,130 --> 00:15:12,910 Well, before the parent of six was zero 348 00:15:14,270 --> 00:15:16,450 and the parent of seven is seven. 349 00:15:16,450 --> 00:15:18,797 So we're going to graph that tree 350 00:15:20,910 --> 00:15:24,390 that was rooted at zero onto seven here 351 00:15:24,390 --> 00:15:27,363 now by making seven the parent of zero. 352 00:15:28,350 --> 00:15:30,193 Now we'll union one and three. 353 00:15:32,290 --> 00:15:34,200 Now union three and five 354 00:15:35,890 --> 00:15:38,003 and now union eight and three. 355 00:15:39,140 --> 00:15:41,240 It should be clear at this point 356 00:15:41,240 --> 00:15:43,133 that we have yet another problem. 357 00:15:44,110 --> 00:15:47,780 The union method is dependent on finding the root 358 00:15:47,780 --> 00:15:51,670 and the getRoot method requires tracing a path back 359 00:15:51,670 --> 00:15:53,640 from an element to its root 360 00:15:55,859 --> 00:15:58,500 but now these trees can get pretty tall 361 00:15:58,500 --> 00:16:00,080 and with the tall tree 362 00:16:00,080 --> 00:16:02,870 finding the root can be a costly operation, 363 00:16:02,870 --> 00:16:06,440 here if I wanna find the root of two, 364 00:16:06,440 --> 00:16:11,000 I have to trace back to eight to zero to seven 365 00:16:11,000 --> 00:16:14,980 back to one to find it's root. 366 00:16:14,980 --> 00:16:17,530 So that can be a costly operation. 367 00:16:17,530 --> 00:16:18,500 Let's look at the code. 368 00:16:18,500 --> 00:16:19,870 Here's an implementation. 369 00:16:19,870 --> 00:16:21,423 We have three methods. 370 00:16:22,710 --> 00:16:25,440 Again, I'm not supplying the surrounding code. 371 00:16:25,440 --> 00:16:26,743 Just focus on these. 372 00:16:27,630 --> 00:16:30,570 We have one to get the root of an arbitrary element 373 00:16:30,570 --> 00:16:32,410 we pass in an integer 374 00:16:32,410 --> 00:16:34,670 that indicates the element in question, 375 00:16:34,670 --> 00:16:38,760 we have a while loop and as long as the value 376 00:16:38,760 --> 00:16:42,070 does not equal the index, we keep going. 377 00:16:42,070 --> 00:16:46,400 We keep stepping back until we find the appropriate node 378 00:16:46,400 --> 00:16:47,883 and then we return that node. 379 00:16:48,870 --> 00:16:51,130 The fine method compares the roots 380 00:16:51,130 --> 00:16:54,740 of two arbitrary elements and returns a Boolean. 381 00:16:54,740 --> 00:16:57,840 So here's getRoot a getRoot b. 382 00:16:57,840 --> 00:16:59,310 Are they equal? 383 00:16:59,310 --> 00:17:00,600 Return true or false 384 00:17:02,530 --> 00:17:06,300 and unite finds the root of both elements 385 00:17:06,300 --> 00:17:09,693 and points the root of one tree to the root of the other. 386 00:17:11,030 --> 00:17:12,490 Now we're not done yet 387 00:17:12,490 --> 00:17:14,540 because of the problem we've pointed out. 388 00:17:14,540 --> 00:17:16,326 Union depends on find and find 389 00:17:16,326 --> 00:17:19,310 has to trace paths back to roots 390 00:17:19,310 --> 00:17:21,690 and the trees can get pretty tall. 391 00:17:21,690 --> 00:17:25,410 So while union strictly by itself takes place 392 00:17:25,410 --> 00:17:26,360 in constant time, 393 00:17:26,360 --> 00:17:29,560 the union method requires calling find. 394 00:17:29,560 --> 00:17:31,700 So a find has linear complexity 395 00:17:31,700 --> 00:17:34,980 so does union and this is not satisfactory. 396 00:17:34,980 --> 00:17:39,840 So here we have find takes linear time 397 00:17:39,840 --> 00:17:42,340 and union takes linear time. 398 00:17:42,340 --> 00:17:45,340 So again we have a worst case for m operations 399 00:17:45,340 --> 00:17:47,110 of all of m times n 400 00:17:47,110 --> 00:17:49,350 which is approximately O of n squared 401 00:17:49,350 --> 00:17:52,050 four m approximately equal to n 402 00:17:52,050 --> 00:17:54,310 and it seems like we're right back where we started 403 00:17:54,310 --> 00:17:56,010 but in the next video, 404 00:17:56,010 --> 00:17:57,870 we'll see a couple of neat tricks 405 00:17:57,870 --> 00:17:59,680 as to how we can improve matters. 406 00:17:59,680 --> 00:18:03,830 We're gonna take this optimization for union 407 00:18:03,830 --> 00:18:05,250 and repair it 408 00:18:05,250 --> 00:18:07,660 and we'll wind up with something that operates 409 00:18:07,660 --> 00:18:09,143 in sub linear time. 410 00:18:10,210 --> 00:18:11,160 That's all for now.