1 00:00:01,970 --> 00:00:05,250 - Okay, now we're gonna do topological sorting. 2 00:00:05,250 --> 00:00:08,790 This is our first graph algorithm proper. 3 00:00:08,790 --> 00:00:11,150 Yeah, I know we've looked at a lot of tree algorithms, 4 00:00:11,150 --> 00:00:13,080 but this is one that operates 5 00:00:13,080 --> 00:00:15,440 on the more general class of graphs. 6 00:00:15,440 --> 00:00:17,860 Although there are some restrictions. 7 00:00:17,860 --> 00:00:21,610 So I'm gonna start with a hypothetical, Egbert and Edwina, 8 00:00:21,610 --> 00:00:25,760 my dear friends, are going to take a vacation, 9 00:00:25,760 --> 00:00:28,803 and there are some things that they need to do. 10 00:00:30,140 --> 00:00:34,720 So trying to get organized, they jot down a list 11 00:00:34,720 --> 00:00:38,150 of all the stuff that they think they need to do. 12 00:00:38,150 --> 00:00:41,240 And they see it's a very daunting list. 13 00:00:41,240 --> 00:00:44,250 So they've been taking a class on algorithms, 14 00:00:44,250 --> 00:00:45,830 and they think they're very clever. 15 00:00:45,830 --> 00:00:49,890 So they're going to build a graph of dependencies. 16 00:00:49,890 --> 00:00:51,710 So what is that? 17 00:00:51,710 --> 00:00:56,480 Well, let's take all these tasks that they need to perform, 18 00:00:56,480 --> 00:01:00,900 and we'll assign a node to each task 19 00:01:02,050 --> 00:01:07,050 and then Egbert and Edwina are going to draw an arrow 20 00:01:08,070 --> 00:01:10,910 from each node to another node 21 00:01:10,910 --> 00:01:14,720 when they think one node should precede another 22 00:01:15,810 --> 00:01:17,943 in their list of things to do. 23 00:01:19,270 --> 00:01:23,380 So they do all that, and you see there's purchase tickets 24 00:01:23,380 --> 00:01:24,893 leads to get seat assignments. 25 00:01:26,790 --> 00:01:30,110 And get seat assignments precedes check-in, 26 00:01:30,110 --> 00:01:33,480 and pack bags precedes feed the fish 27 00:01:33,480 --> 00:01:35,660 but this is kind of a rat's nest. 28 00:01:35,660 --> 00:01:37,180 It's a mess. 29 00:01:37,180 --> 00:01:40,480 And they're not quite sure how to use this information 30 00:01:40,480 --> 00:01:44,820 in order to make a list of the things that they need to do 31 00:01:44,820 --> 00:01:47,310 in the order that they need to do them. 32 00:01:47,310 --> 00:01:48,840 So the first thing they decide to do 33 00:01:48,840 --> 00:01:51,570 is get these words out of the way. 34 00:01:51,570 --> 00:01:54,140 They realize that they have a labeled graph, 35 00:01:54,140 --> 00:01:56,590 and so they can use those labels. 36 00:01:56,590 --> 00:01:58,120 So they get them out of the way 37 00:01:58,120 --> 00:01:59,970 so that they can get a clear picture of the graph. 38 00:01:59,970 --> 00:02:02,620 And now we see node A is associated 39 00:02:02,620 --> 00:02:03,940 with get seat assignments. 40 00:02:03,940 --> 00:02:07,100 Node B is associated with make sure the oven is off 41 00:02:07,100 --> 00:02:08,150 and things like that. 42 00:02:09,250 --> 00:02:11,593 And then they puzzle over what to do. 43 00:02:12,660 --> 00:02:15,080 What now? Okay. 44 00:02:15,080 --> 00:02:19,200 Well, Edwina has the brilliant observation 45 00:02:19,200 --> 00:02:22,150 that Node G here 46 00:02:22,150 --> 00:02:25,190 has no arrows pointing into it. 47 00:02:25,190 --> 00:02:27,800 It only has arrows going out of it. 48 00:02:27,800 --> 00:02:29,493 And when they take a look at the graph, 49 00:02:29,493 --> 00:02:31,400 when they take a look at it more carefully, 50 00:02:31,400 --> 00:02:35,850 they realize that this is the only node in this graph 51 00:02:35,850 --> 00:02:38,990 that has out arrows, but no in arrows. 52 00:02:38,990 --> 00:02:42,370 So they realize that this means 53 00:02:42,370 --> 00:02:45,380 that nothing precedes this particular step, 54 00:02:45,380 --> 00:02:47,980 which is purchasing tickets. 55 00:02:47,980 --> 00:02:49,430 So they 56 00:02:50,600 --> 00:02:54,220 take G out of the graph and set it aside 57 00:02:54,220 --> 00:02:57,053 and remove all the edges that were associated with it. 58 00:02:58,290 --> 00:03:01,200 And then they take a look and say, well, gee, what's next? 59 00:03:01,200 --> 00:03:05,670 And they see now there are two nodes 60 00:03:05,670 --> 00:03:07,943 that have no edges going into them. 61 00:03:08,980 --> 00:03:11,890 And so they decide they're gonna do the same thing. 62 00:03:11,890 --> 00:03:14,890 They're gonna take one node out of the graph. 63 00:03:14,890 --> 00:03:18,240 And so they pick one node, they pick D, 64 00:03:18,240 --> 00:03:22,530 and they take that out of the graph and set it next to G, 65 00:03:22,530 --> 00:03:24,223 and they delete all the edges. 66 00:03:25,330 --> 00:03:26,573 So what's next? 67 00:03:27,680 --> 00:03:32,680 Well, now there are four nodes that have no in edges. 68 00:03:32,840 --> 00:03:37,840 They're A, B, K, and E all have edges going out of them, 69 00:03:38,420 --> 00:03:40,020 but no edges going in. 70 00:03:40,020 --> 00:03:42,790 So again, they realize that these are things 71 00:03:42,790 --> 00:03:46,210 that anything that's preceded them is already removed 72 00:03:46,210 --> 00:03:48,840 from the graph so they can pick any one of these. 73 00:03:48,840 --> 00:03:52,180 So they pick A, and they remove that from the graph, 74 00:03:52,180 --> 00:03:54,280 and they repeat this process. 75 00:03:54,280 --> 00:03:58,450 So at the next step, E, B, and K have no in edges. 76 00:03:58,450 --> 00:04:01,943 And so they pick one, K, and they remove it from the graph. 77 00:04:03,640 --> 00:04:05,980 Then again, now they look, they see E and B 78 00:04:05,980 --> 00:04:08,910 are the only two nodes with no in edges. 79 00:04:08,910 --> 00:04:12,513 And so they pick one B, and they remove it from the graph. 80 00:04:14,270 --> 00:04:17,113 Again, they remove E from the graph. 81 00:04:17,990 --> 00:04:21,140 And now there's only one. 82 00:04:21,140 --> 00:04:22,500 So that's easy. 83 00:04:22,500 --> 00:04:24,083 They take I from the graph. 84 00:04:24,920 --> 00:04:26,990 Again, there's only one, F. 85 00:04:26,990 --> 00:04:28,940 Everything else has an in edge. 86 00:04:31,610 --> 00:04:33,433 Now, J is the only one. 87 00:04:37,320 --> 00:04:40,243 Now C is the only one with no in edges. 88 00:04:41,400 --> 00:04:42,950 And we get down to the last two, 89 00:04:42,950 --> 00:04:44,850 and you see how that's gonna work out. 90 00:04:47,120 --> 00:04:49,393 So what they've produced, 91 00:04:51,160 --> 00:04:54,410 they use this ordering here that we have at the bottom, 92 00:04:54,410 --> 00:04:57,210 the order that they removed these nodes from the graph, 93 00:04:57,210 --> 00:04:59,400 and they use that to reorder their list. 94 00:04:59,400 --> 00:05:03,080 And now they have a list that tells them 95 00:05:03,080 --> 00:05:06,200 what order they should perform all of these steps, 96 00:05:06,200 --> 00:05:09,120 purchase tickets, pack bags, get the seat assignments, 97 00:05:09,120 --> 00:05:12,290 feed the fish, make sure the oven is off, water the plants. 98 00:05:12,290 --> 00:05:14,690 Now you can lock the front door. 99 00:05:14,690 --> 00:05:16,220 Now that you've locked the front door, 100 00:05:16,220 --> 00:05:18,440 you can leave the key with the neighbor. 101 00:05:18,440 --> 00:05:21,010 Only then can you drive to the airport, park the car, 102 00:05:21,010 --> 00:05:22,510 check in, and board the plane. 103 00:05:23,910 --> 00:05:27,313 So this is what's called a topological sort. 104 00:05:29,400 --> 00:05:31,643 And to be more formal about it, 105 00:05:32,510 --> 00:05:36,993 topological sorting is we're given some graph G, 106 00:05:38,220 --> 00:05:41,630 you produce a linear ordering of its nodes 107 00:05:41,630 --> 00:05:44,500 such that every directed edge, AB, 108 00:05:44,500 --> 00:05:49,250 from node A to node B, for each one of those edges, 109 00:05:49,250 --> 00:05:51,653 A comes before B in the ordering. 110 00:05:52,690 --> 00:05:55,200 Now it should probably come as no surprise 111 00:05:55,200 --> 00:05:58,960 that you can do this with directed, acyclic graphs only, 112 00:05:58,960 --> 00:06:02,500 the graph has to be directed because they're directed edges. 113 00:06:02,500 --> 00:06:07,500 And you can't have a cycle because then there's this loop, 114 00:06:07,590 --> 00:06:09,810 and there's no end to it. 115 00:06:09,810 --> 00:06:14,163 And so there's no preceding a relationship. 116 00:06:15,890 --> 00:06:18,123 Cycles break that kind of a relationship. 117 00:06:19,050 --> 00:06:20,150 It's also important to note 118 00:06:20,150 --> 00:06:22,340 that some directed acyclic graphs 119 00:06:22,340 --> 00:06:24,540 have more than one topological sort. 120 00:06:24,540 --> 00:06:27,690 So to go back to the example here earlier, 121 00:06:27,690 --> 00:06:31,300 every time we had a choice of nodes 122 00:06:31,300 --> 00:06:33,023 to remove from the graph, 123 00:06:34,730 --> 00:06:38,060 that's a branch in the possibilities 124 00:06:38,060 --> 00:06:40,160 of the topological sorts 125 00:06:40,160 --> 00:06:42,240 that we could have produced for that graph. 126 00:06:42,240 --> 00:06:44,410 So some directed acyclic graphs 127 00:06:44,410 --> 00:06:46,740 will have one topological sort. 128 00:06:46,740 --> 00:06:50,760 Some may have more than one topological sort, 129 00:06:50,760 --> 00:06:54,290 but every directed acyclic graph 130 00:06:54,290 --> 00:06:57,183 has at least one topological sort. 131 00:07:00,070 --> 00:07:02,180 And there's a directed, acyclic graph. 132 00:07:02,180 --> 00:07:05,350 We know it is even without examining it any further 133 00:07:05,350 --> 00:07:08,930 because we've just performed a sort on it, 134 00:07:08,930 --> 00:07:10,700 and it's worked okay. 135 00:07:10,700 --> 00:07:13,130 Here's some simpler examples. 136 00:07:13,130 --> 00:07:16,650 On the left, we have directed, acyclic graph, 137 00:07:16,650 --> 00:07:19,370 and on the right, we have a graph that's directed, 138 00:07:19,370 --> 00:07:20,430 but not acyclic. 139 00:07:20,430 --> 00:07:23,263 We see it contains this cycle highlighted in yellow. 140 00:07:25,540 --> 00:07:27,770 Now here's some pseudocode 141 00:07:27,770 --> 00:07:30,340 for implementing topological sort. 142 00:07:30,340 --> 00:07:33,680 And as you might expect, it's pretty simple. 143 00:07:33,680 --> 00:07:35,270 We start with an empty list 144 00:07:35,270 --> 00:07:37,810 to hold the topologically sorted nodes. 145 00:07:37,810 --> 00:07:40,750 And we start with a list of all the nodes 146 00:07:40,750 --> 00:07:44,020 with no incoming edges, all right? 147 00:07:44,020 --> 00:07:47,880 In the first example, there was only one such node, G, 148 00:07:47,880 --> 00:07:49,713 but there may be more than one. 149 00:07:50,610 --> 00:07:53,890 And as long as that set is not empty, 150 00:07:53,890 --> 00:07:56,800 we remove a node from that set. 151 00:07:56,800 --> 00:07:58,640 If there's more than one, we can pick any one. 152 00:07:58,640 --> 00:07:59,540 It doesn't matter. 153 00:08:00,660 --> 00:08:03,520 We append that node to the list t 154 00:08:03,520 --> 00:08:06,570 that holds the topologically sorted nodes. 155 00:08:06,570 --> 00:08:10,100 And then for each node with an edge e from n to m, 156 00:08:10,100 --> 00:08:12,430 we remove the edge from the graph 157 00:08:12,430 --> 00:08:15,670 just as we saw in the previous example. 158 00:08:15,670 --> 00:08:18,720 And if m has no incoming edges, 159 00:08:18,720 --> 00:08:22,920 then we insert that into, oh, that's a mistake. 160 00:08:22,920 --> 00:08:25,893 Let me fix that right now, into s. 161 00:08:28,060 --> 00:08:32,260 If m has no incoming edges, then we insert m into that set 162 00:08:32,260 --> 00:08:34,713 of all nodes with no incoming edges. 163 00:08:35,720 --> 00:08:38,820 And this while loop will execute 164 00:08:38,820 --> 00:08:43,133 until there's nothing left in s, 165 00:08:44,470 --> 00:08:47,630 and if we get to this point and the graph still has edges, 166 00:08:47,630 --> 00:08:50,130 then we know we've encountered a cycle, 167 00:08:50,130 --> 00:08:52,070 and so we return an error. 168 00:08:52,070 --> 00:08:55,420 Otherwise, we return t, this empty list, 169 00:08:55,420 --> 00:08:57,723 to hold the topologically sorted nodes. 170 00:08:58,710 --> 00:09:03,660 So with this smaller example, we start off, 171 00:09:03,660 --> 00:09:07,780 and A is the only node that has no incoming edges, 172 00:09:07,780 --> 00:09:09,350 so that's in set s. 173 00:09:09,350 --> 00:09:10,580 Set t is empty. 174 00:09:10,580 --> 00:09:13,093 I shouldn't call them sets. Those are vectors. 175 00:09:14,200 --> 00:09:15,513 Bad instructor. 176 00:09:16,430 --> 00:09:18,140 Okay, and then we remove A, 177 00:09:18,140 --> 00:09:22,890 and we find that F now has no incoming edges. 178 00:09:22,890 --> 00:09:25,440 And so we add F to s. 179 00:09:25,440 --> 00:09:29,640 And then we take F, and we put it in t, 180 00:09:29,640 --> 00:09:32,130 and we find that G has no incoming edges, 181 00:09:32,130 --> 00:09:33,330 just like we did before. 182 00:09:38,710 --> 00:09:41,420 And we've produced a topological sort. 183 00:09:41,420 --> 00:09:43,250 Now, one thing that I think is really cool 184 00:09:43,250 --> 00:09:47,400 about a topological sort is when you're done 185 00:09:47,400 --> 00:09:51,440 if you were to redraw all of those edges back in it, 186 00:09:51,440 --> 00:09:55,713 you remember, we, when we took the nodes out of the graph, 187 00:09:56,690 --> 00:09:58,400 we didn't include the edges. 188 00:09:58,400 --> 00:10:00,030 But if you draw the edges back in, 189 00:10:00,030 --> 00:10:01,760 now all the edges, you'll notice, 190 00:10:01,760 --> 00:10:04,450 are moving in the same direction. 191 00:10:04,450 --> 00:10:06,500 We've produced a linear order, 192 00:10:06,500 --> 00:10:08,420 and nothing is going backwards. 193 00:10:08,420 --> 00:10:10,840 Everything is going forwards. 194 00:10:10,840 --> 00:10:14,220 And so that's a neat property that we introduce 195 00:10:14,220 --> 00:10:16,550 in the particular drawing of the graph 196 00:10:16,550 --> 00:10:18,523 when we perform a topological sort. 197 00:10:19,870 --> 00:10:24,870 And by the same token, this is the graph of Edwina 198 00:10:25,100 --> 00:10:28,000 and, what was his name? 199 00:10:28,000 --> 00:10:31,150 Egbert's vacation. 200 00:10:31,150 --> 00:10:33,940 And this is the representation of it 201 00:10:33,940 --> 00:10:35,700 given a topological sort. 202 00:10:35,700 --> 00:10:38,320 And again, you'll see that all of the arrows 203 00:10:38,320 --> 00:10:41,270 are moving in the forward direction. 204 00:10:41,270 --> 00:10:43,400 So I think that's kind of cool. 205 00:10:43,400 --> 00:10:47,780 And that's really all there is to topological sort. 206 00:10:47,780 --> 00:10:49,380 What's the complexity? 207 00:10:49,380 --> 00:10:52,950 Well, to process the entire graph, 208 00:10:52,950 --> 00:10:56,070 we have to go through all the vertices and all the edges. 209 00:10:56,070 --> 00:10:58,720 We have to remove the vertices one at a time. 210 00:10:58,720 --> 00:11:00,910 And we have to remove all the edges 211 00:11:01,840 --> 00:11:04,060 that are incident to the vertices that we remove, 212 00:11:04,060 --> 00:11:05,820 or nodes that we remove. 213 00:11:05,820 --> 00:11:09,423 And so the complexity in the worst case, 214 00:11:10,410 --> 00:11:14,370 generally in fact, is big O of the cardinality of V 215 00:11:14,370 --> 00:11:16,630 plus the cardinality of E. 216 00:11:16,630 --> 00:11:21,090 And now you know how to do topological sorts. Super cool. 217 00:11:21,090 --> 00:11:22,683 All right, that's it for now.