1 00:00:01,890 --> 00:00:04,410 - [Instructor] All right, Kruskal's algorithm. 2 00:00:04,410 --> 00:00:08,880 In an earlier video, we saw Prim's algorithm, 3 00:00:08,880 --> 00:00:13,150 which is designed to find a minimum spanning tree 4 00:00:13,150 --> 00:00:15,290 in a connected graph. 5 00:00:15,290 --> 00:00:16,350 And we had a little reminder 6 00:00:16,350 --> 00:00:19,220 about trees and graphs and things like that. 7 00:00:19,220 --> 00:00:21,460 And in this video, 8 00:00:21,460 --> 00:00:23,190 we're gonna look at Kruskal's algorithm, 9 00:00:23,190 --> 00:00:25,763 which solves the same problem as Prim's, 10 00:00:26,720 --> 00:00:29,800 finding the minimum spanning tree in a graph, 11 00:00:29,800 --> 00:00:33,170 but it does it with a completely different approach. 12 00:00:33,170 --> 00:00:35,620 So before we dive in, 13 00:00:35,620 --> 00:00:37,390 let's just again remind ourselves 14 00:00:37,390 --> 00:00:39,290 what is a minimum spanning tree? 15 00:00:39,290 --> 00:00:43,920 Let's say we have some graph here on some nodes, 16 00:00:43,920 --> 00:00:48,300 the node set given by V and the edge set given by E. 17 00:00:48,300 --> 00:00:51,030 And we have weights on all of these edges. 18 00:00:51,030 --> 00:00:53,270 These are weighted edges. 19 00:00:53,270 --> 00:00:55,670 And we want to find a minimum spanning tree. 20 00:00:55,670 --> 00:01:00,040 That's a tree that touches each one of the nodes. 21 00:01:00,040 --> 00:01:02,820 We call this minimum spanning tree G prime. 22 00:01:02,820 --> 00:01:05,080 It's gonna have exactly the same node set. 23 00:01:05,080 --> 00:01:08,180 It's gonna have a different edge set we call E prime, 24 00:01:08,180 --> 00:01:11,660 where E prime is a subset of the edges 25 00:01:11,660 --> 00:01:13,720 in the underlying graph. 26 00:01:13,720 --> 00:01:15,230 We want G prime to be a tree. 27 00:01:15,230 --> 00:01:17,540 It's gotta be a spanning tree, right? 28 00:01:17,540 --> 00:01:21,020 And we want to minimize this sum. 29 00:01:21,020 --> 00:01:23,900 This sum is the sum of all of the weights 30 00:01:23,900 --> 00:01:25,870 of all of the edges in E prime. 31 00:01:25,870 --> 00:01:30,120 So for E prime here, for every u, v in E prime, 32 00:01:30,120 --> 00:01:32,790 we add the weight, that's our sum. 33 00:01:32,790 --> 00:01:34,440 So we wanna find the minimum of those. 34 00:01:34,440 --> 00:01:36,563 So that's a minimum spanning tree. 35 00:01:38,420 --> 00:01:40,320 And again, in the previous video, 36 00:01:40,320 --> 00:01:44,740 we saw this example of this minimum spanning tree 37 00:01:44,740 --> 00:01:47,080 on this graph here, 38 00:01:47,080 --> 00:01:52,080 where the sum of the edges in E prime was equal to 12, 39 00:01:52,590 --> 00:01:57,590 which is the smallest possible weight edges 40 00:01:57,620 --> 00:02:01,250 that we can find that spans all of G. 41 00:02:01,250 --> 00:02:03,560 So that's a minimum spanning tree. 42 00:02:03,560 --> 00:02:07,690 So what does Kruskal's algorithm do? 43 00:02:07,690 --> 00:02:12,200 Well, Kruskal's algorithm works differently than Prim's. 44 00:02:12,200 --> 00:02:17,200 And what we do is we take a list of all of the edges, 45 00:02:17,790 --> 00:02:21,860 we have that edge set, and we order them. 46 00:02:21,860 --> 00:02:26,860 We just notice that this is now a sorted list of edges. 47 00:02:28,020 --> 00:02:29,420 Bh has weight two, 48 00:02:29,420 --> 00:02:33,150 that's the smallest weight edge in this graph here. 49 00:02:33,150 --> 00:02:37,790 And ah, which is this one here, has a weight of nine. 50 00:02:37,790 --> 00:02:39,640 That's the largest weight edge. 51 00:02:39,640 --> 00:02:42,580 And so these just go from the smallest weight edge 52 00:02:42,580 --> 00:02:44,880 to the largest weight edge. 53 00:02:44,880 --> 00:02:46,053 And then what we do, 54 00:02:47,843 --> 00:02:51,273 what we did in Prim's algorithm is we picked a node 55 00:02:52,710 --> 00:02:56,110 and we said now let's find the minimum weight edge 56 00:02:56,110 --> 00:03:01,110 that connects us to this subset of nodes V prime. 57 00:03:03,520 --> 00:03:05,280 We're gonna do this differently. 58 00:03:05,280 --> 00:03:08,757 We're simply gonna take the minimum weight edge here 59 00:03:10,920 --> 00:03:14,930 and we're gonna add that to our minimum spanning tree. 60 00:03:14,930 --> 00:03:17,350 And we're just gonna keep doing that. 61 00:03:17,350 --> 00:03:20,290 We're gonna throw out edges that form cycles. 62 00:03:20,290 --> 00:03:23,710 So if a cycle is formed, we're just gonna toss that edge, 63 00:03:23,710 --> 00:03:25,570 and we're gonna move on to the next one. 64 00:03:25,570 --> 00:03:27,200 And we're just going to do that 65 00:03:27,200 --> 00:03:29,900 until we either exhaust all the edges. 66 00:03:29,900 --> 00:03:32,790 Maybe the last one is the last one we add. 67 00:03:32,790 --> 00:03:36,280 Or we have a set of edges 68 00:03:36,280 --> 00:03:39,560 that touches every node in the graph, 69 00:03:39,560 --> 00:03:41,020 and that's Kruskal's algorithm. 70 00:03:41,020 --> 00:03:42,650 It's really, really simple. 71 00:03:42,650 --> 00:03:45,150 It's a very elegant algorithm. 72 00:03:45,150 --> 00:03:47,420 But it does mean that we have to check 73 00:03:47,420 --> 00:03:49,010 to see if there are cycles formed 74 00:03:49,010 --> 00:03:52,320 and throw out the edges that cause cycles. 75 00:03:52,320 --> 00:03:55,210 So again, the first one we add is this one bh, 76 00:03:55,210 --> 00:03:57,910 cause it's the first one in our list. 77 00:03:57,910 --> 00:04:01,080 We gray that out to show that that's no longer available. 78 00:04:01,080 --> 00:04:04,920 We check aj, can we add aj? 79 00:04:04,920 --> 00:04:06,220 Yeah, sure, we can add aj. 80 00:04:06,220 --> 00:04:07,390 Does it form a cycle? 81 00:04:07,390 --> 00:04:08,930 No, it doesn't form a cycle. 82 00:04:08,930 --> 00:04:09,763 So we're good. 83 00:04:10,850 --> 00:04:13,630 Now bc, that's the next one on the list. 84 00:04:13,630 --> 00:04:14,463 Can we add that? 85 00:04:14,463 --> 00:04:15,296 Yeah, sure. 86 00:04:15,296 --> 00:04:17,453 It doesn't form a cycle. We're safe. 87 00:04:18,580 --> 00:04:22,700 Now the next one on the list is ac, can we add ac? 88 00:04:22,700 --> 00:04:25,023 Yup, doesn't form a cycle. 89 00:04:26,310 --> 00:04:28,093 Next one on the list is dh. 90 00:04:30,006 --> 00:04:31,860 Dh, that's this one here. 91 00:04:31,860 --> 00:04:32,693 Can we add that? 92 00:04:32,693 --> 00:04:33,526 Sure, we can add that. 93 00:04:33,526 --> 00:04:34,903 That doesn't add a cycle. 94 00:04:36,560 --> 00:04:38,930 And now we have a minimum spanning tree. 95 00:04:38,930 --> 00:04:41,200 Now that was a little bit of a contrived example 96 00:04:41,200 --> 00:04:44,210 so that we didn't encounter any cycles. 97 00:04:44,210 --> 00:04:46,210 But it gives you an idea. 98 00:04:46,210 --> 00:04:50,280 Let's do a different example where we have some cycles. 99 00:04:50,280 --> 00:04:52,130 We have to throw out some edges. 100 00:04:52,130 --> 00:04:56,290 So here again, we have a different graph, same idea though. 101 00:04:56,290 --> 00:04:59,130 We have a collection of nodes 102 00:04:59,130 --> 00:05:00,820 and we have a collection of weighted edges 103 00:05:00,820 --> 00:05:04,610 and we sort the edges by their edge weight. 104 00:05:04,610 --> 00:05:08,310 And notice now before we run from two to nine, 105 00:05:08,310 --> 00:05:11,150 and here we have a whole bunch of conflicts here. 106 00:05:11,150 --> 00:05:14,090 So that's gonna make this run through the algorithm 107 00:05:14,090 --> 00:05:16,050 a little more interesting. 108 00:05:16,050 --> 00:05:19,980 So we start with the minimum weight edge here. 109 00:05:19,980 --> 00:05:21,080 We could pick any of them, 110 00:05:21,080 --> 00:05:23,850 but we'll pick this one, ac, 111 00:05:23,850 --> 00:05:27,160 and we add that to our minimum spanning tree. 112 00:05:27,160 --> 00:05:29,890 And then we check the next one is ad. 113 00:05:29,890 --> 00:05:31,000 So can we add that? 114 00:05:31,000 --> 00:05:33,210 Yeah, sure, we can add that, that's fine. 115 00:05:33,210 --> 00:05:37,270 Now ch, ch is this one here. 116 00:05:37,270 --> 00:05:38,860 Yeah, that's safe to add. 117 00:05:38,860 --> 00:05:40,163 So we'll add that one. 118 00:05:41,560 --> 00:05:43,070 Now, oh, we got a problem. 119 00:05:43,070 --> 00:05:45,670 Dh, dh is the next one on the list. 120 00:05:45,670 --> 00:05:49,130 But if we were to add dh, that would create a cycle, 121 00:05:49,130 --> 00:05:50,530 and we wouldn't have a tree anymore. 122 00:05:50,530 --> 00:05:52,530 So we're just gonna throw that one out. 123 00:05:52,530 --> 00:05:55,123 We're gonna chuck it, cross it off the list. 124 00:05:56,080 --> 00:05:57,250 Now what's the next one? 125 00:05:57,250 --> 00:06:01,840 Aj, aj that looks like that's safe to add. 126 00:06:01,840 --> 00:06:05,497 So we'll add that one to our minimum spanning tree 127 00:06:05,497 --> 00:06:07,560 and we're getting close to the end here. 128 00:06:07,560 --> 00:06:10,610 What's next? Dj. 129 00:06:10,610 --> 00:06:13,250 Dj, oops, that forms a cycle. 130 00:06:13,250 --> 00:06:15,343 See, that would form this cycle here. 131 00:06:16,250 --> 00:06:17,840 So we gotta throw that one out. 132 00:06:17,840 --> 00:06:19,240 So we'll throw that one out. 133 00:06:21,230 --> 00:06:23,750 And now what's next on list? 134 00:06:23,750 --> 00:06:27,510 Cd, cd, that would form a cycle. 135 00:06:27,510 --> 00:06:30,090 See this one, this would form a cycle here like this. 136 00:06:30,090 --> 00:06:31,710 So we can't use cd. 137 00:06:31,710 --> 00:06:33,113 So we throw cd up. 138 00:06:34,560 --> 00:06:36,120 What's next on our list? 139 00:06:36,120 --> 00:06:39,620 Bh, well, we can add bh. 140 00:06:39,620 --> 00:06:40,480 That's safe. 141 00:06:40,480 --> 00:06:43,190 It doesn't create a cycle. 142 00:06:43,190 --> 00:06:47,440 And so now we've completed our minimum spanning tree. 143 00:06:48,370 --> 00:06:53,330 There's an edge incident to every node in the graph. 144 00:06:53,330 --> 00:06:58,330 And we found a subset of edges that has a minimum weight. 145 00:06:58,580 --> 00:07:00,330 And I don't know about you, 146 00:07:00,330 --> 00:07:02,390 but I think that's a pretty cool algorithm. 147 00:07:02,390 --> 00:07:05,133 It's very simple in concept. 148 00:07:05,980 --> 00:07:08,463 So here's some pseudocode. 149 00:07:09,950 --> 00:07:10,890 We start off. 150 00:07:10,890 --> 00:07:13,210 This is our function kruskals 151 00:07:13,210 --> 00:07:15,870 and it takes G as a parameter. 152 00:07:15,870 --> 00:07:20,870 And we initialize our edge set E prime to the empty set. 153 00:07:21,090 --> 00:07:23,200 And then we take a little extra time. 154 00:07:23,200 --> 00:07:25,130 We sort the edges in V, 155 00:07:25,130 --> 00:07:28,640 and we can use a sorting algorithm and sort a vector 156 00:07:28,640 --> 00:07:30,210 or we can pop them into a min heap 157 00:07:30,210 --> 00:07:34,143 and let the min heap do the work for us, either is fine. 158 00:07:36,220 --> 00:07:41,220 And then as long as the size of this edge set E prime 159 00:07:42,090 --> 00:07:45,730 is less than the number of edges that we need, 160 00:07:45,730 --> 00:07:50,260 that's the cardinality of V minus one. 161 00:07:50,260 --> 00:07:53,170 If the next edge in the sort or the heap 162 00:07:53,170 --> 00:07:55,300 does not create a cycle, 163 00:07:55,300 --> 00:07:59,600 we add this edge to E prime and that's our while loop. 164 00:07:59,600 --> 00:08:02,540 And as soon as we hit the condition 165 00:08:02,540 --> 00:08:06,620 where the cardinality V prime equals 166 00:08:06,620 --> 00:08:08,500 the cardinality of V minus one, 167 00:08:08,500 --> 00:08:10,280 we know we're done and we return 168 00:08:10,280 --> 00:08:13,540 and we return the node set 169 00:08:13,540 --> 00:08:18,540 and the subset of the starting edge set E prime. 170 00:08:21,480 --> 00:08:22,910 And that's it. 171 00:08:22,910 --> 00:08:24,580 That's all there is to it. 172 00:08:24,580 --> 00:08:29,310 Now you may have noticed that I kind of swept 173 00:08:29,310 --> 00:08:30,920 a little detail under the rug. 174 00:08:30,920 --> 00:08:33,060 And what is that detail? 175 00:08:33,060 --> 00:08:38,060 The detail is how do we know that we formed a cycle? 176 00:08:39,210 --> 00:08:41,840 Well, this is where the study 177 00:08:41,840 --> 00:08:44,540 of algorithms gets really interesting 178 00:08:44,540 --> 00:08:46,040 as far as I'm concerned, 179 00:08:46,040 --> 00:08:50,433 because we can combine things that we've learned already. 180 00:08:51,270 --> 00:08:55,000 And there's a data structure and a procedure 181 00:08:55,000 --> 00:08:57,763 that makes this almost work like magic, 182 00:08:58,800 --> 00:09:02,073 and that's using disjoint sets find and union. 183 00:09:04,110 --> 00:09:05,470 So how does this work? 184 00:09:05,470 --> 00:09:08,650 We have, again, this is a revised pseudocode. 185 00:09:08,650 --> 00:09:10,830 We have Kruskal's takes in G, 186 00:09:10,830 --> 00:09:14,820 and we're gonna do this using disjoint sets find and union. 187 00:09:14,820 --> 00:09:19,610 And so we let E prime equals the empty set. 188 00:09:19,610 --> 00:09:22,600 And again we sort the edges in E. 189 00:09:22,600 --> 00:09:25,630 we can use a sorted vector or min heap. 190 00:09:25,630 --> 00:09:30,350 We make singleton sets out of all of the nodes in V. 191 00:09:30,350 --> 00:09:34,320 Remember that's how disjoint sets find and union works. 192 00:09:34,320 --> 00:09:38,260 We start off with singleton sets, 193 00:09:38,260 --> 00:09:42,640 and then we take the union of sets as we connect components. 194 00:09:42,640 --> 00:09:46,223 And so for each edge in E, 195 00:09:47,510 --> 00:09:51,610 if find u not equal find v, 196 00:09:51,610 --> 00:09:53,210 this is how we're detecting 197 00:09:53,210 --> 00:09:55,793 whether there's gonna be a cycle formed. 198 00:09:57,770 --> 00:10:02,770 If this query returns false, 199 00:10:04,560 --> 00:10:09,560 then we know that u and v are in the same set already. 200 00:10:12,500 --> 00:10:17,500 And if we add an edge between them, we'll create a cycle. 201 00:10:18,140 --> 00:10:23,140 So only in the event that find u not equals find v, 202 00:10:24,170 --> 00:10:26,910 we add the edge to E prime. 203 00:10:26,910 --> 00:10:28,850 So that gets added here. 204 00:10:28,850 --> 00:10:32,020 We take the union u and v. 205 00:10:32,020 --> 00:10:33,633 And we just keep doing that. 206 00:10:36,460 --> 00:10:37,293 You know what? 207 00:10:37,293 --> 00:10:39,730 This should be indented. 208 00:10:39,730 --> 00:10:41,590 I'm gonna fix that right here on the front. 209 00:10:41,590 --> 00:10:44,223 Oh, no, I'm gonna fix that here. 210 00:10:49,330 --> 00:10:50,313 Sorry, I bet. 211 00:10:52,850 --> 00:10:56,800 So for each edge u, v in E, we do this check. 212 00:10:56,800 --> 00:11:00,330 If this turns out to be true, 213 00:11:00,330 --> 00:11:02,640 then we add the edge, we take the union, 214 00:11:02,640 --> 00:11:05,700 and then we just keep exploring our edges. 215 00:11:05,700 --> 00:11:10,520 And that is a very elegant, I think, 216 00:11:10,520 --> 00:11:14,070 that's a very elegant way of solving this problem, 217 00:11:14,070 --> 00:11:18,210 where we use disjoint sets, we use a min heap, 218 00:11:18,210 --> 00:11:22,490 and we get a minimum spanning tree 219 00:11:22,490 --> 00:11:26,690 in just the fewest lines of pseudocode here. 220 00:11:26,690 --> 00:11:29,940 And implementing this in C++ or Java or Python 221 00:11:29,940 --> 00:11:32,763 or any other language is just dead simple. 222 00:11:33,720 --> 00:11:37,113 So the question, is it greedy? 223 00:11:38,350 --> 00:11:40,130 Yup, it's greedy, why? 224 00:11:40,130 --> 00:11:45,130 Because we grab the edge that we can add 225 00:11:46,630 --> 00:11:48,880 without forming a cycle. 226 00:11:48,880 --> 00:11:51,080 And we never reconsider that edge. 227 00:11:51,080 --> 00:11:52,830 We never take it out. 228 00:11:52,830 --> 00:11:55,963 Once make that decision, we've glommed onto it, we're done. 229 00:11:57,820 --> 00:12:00,830 What is the worst case complexity here? 230 00:12:00,830 --> 00:12:05,050 Well, the worst case complexity is big O 231 00:12:05,050 --> 00:12:08,690 of the cardinality of the edge set times the log 232 00:12:08,690 --> 00:12:10,530 of the cardinality of that edge set, 233 00:12:10,530 --> 00:12:13,600 and that's presuming that we're going to use min heap. 234 00:12:13,600 --> 00:12:16,820 The biggest cost is in sorting the edges by the weights. 235 00:12:16,820 --> 00:12:19,410 And so that's the most costly part. 236 00:12:19,410 --> 00:12:22,570 So if we use min heap to order everything 237 00:12:22,570 --> 00:12:24,370 and then just pop them off the heap, 238 00:12:25,660 --> 00:12:28,650 that gives us a big O of the cardinality 239 00:12:28,650 --> 00:12:31,020 of the edge set times the log of the cardinality 240 00:12:31,020 --> 00:12:32,380 of the edge set. 241 00:12:32,380 --> 00:12:36,280 And that, my friends, is Kruskal's algorithm. 242 00:12:36,280 --> 00:12:38,653 And I think that's a really cool algorithm. 243 00:12:39,490 --> 00:12:44,490 So this concludes our discussion of minimum spanning trees. 244 00:12:46,300 --> 00:12:50,790 And this is the last topic we're formally going to cover 245 00:12:50,790 --> 00:12:53,150 in these video lectures. 246 00:12:53,150 --> 00:12:58,150 So everything else is going to be posted on the discussions 247 00:12:58,970 --> 00:13:03,950 and the review questions that we have for the final exam 248 00:13:03,950 --> 00:13:06,150 in their own section on Blackboard. 249 00:13:06,150 --> 00:13:08,623 So that's all for now. Thanks. Bye bye.