1 00:00:01,260 --> 00:00:02,093 - [Instructor] Hi. 2 00:00:02,093 --> 00:00:05,660 This is a continuation of our introduction to graphs. 3 00:00:05,660 --> 00:00:08,270 This is a sneak preview into some graph theory 4 00:00:08,270 --> 00:00:10,960 that you'll see in CS 224 5 00:00:10,960 --> 00:00:14,330 and you should treat this as supplementary. 6 00:00:14,330 --> 00:00:18,530 I'm not going to assess you on any material in this video, 7 00:00:18,530 --> 00:00:21,330 so treat this as purely optional, 8 00:00:21,330 --> 00:00:24,390 but I think it's interesting material. 9 00:00:24,390 --> 00:00:28,370 And if you do plan on taking CS 224, which I recommend. 10 00:00:28,370 --> 00:00:30,960 It's the advanced algorithms class, 11 00:00:30,960 --> 00:00:34,800 we call it Algorithm Design and Analysis, 12 00:00:34,800 --> 00:00:36,100 then you'll see some of the concepts 13 00:00:36,100 --> 00:00:38,180 that I'm gonna present here. 14 00:00:38,180 --> 00:00:40,380 And this should help you get up to speed. 15 00:00:40,380 --> 00:00:42,860 But in any event, I hope you find it interesting, 16 00:00:42,860 --> 00:00:46,990 but you won't be examined or quizzed or anything else 17 00:00:46,990 --> 00:00:49,380 with regard to this material. 18 00:00:49,380 --> 00:00:54,380 So let's start with these graphs here. 19 00:00:56,050 --> 00:00:57,973 What do these graphs have in common? 20 00:01:01,060 --> 00:01:03,450 Well, these graphs are fully connected. 21 00:01:03,450 --> 00:01:06,830 Each node is connected to every other node 22 00:01:06,830 --> 00:01:08,700 and we have a special term for this. 23 00:01:08,700 --> 00:01:11,110 We call these complete graphs. 24 00:01:11,110 --> 00:01:14,010 And these are so special that we even have names for them. 25 00:01:15,450 --> 00:01:18,920 We call the complete graph on two vertices K2, 26 00:01:18,920 --> 00:01:22,703 the complete graph on three vertices, K3, and so on. 27 00:01:26,200 --> 00:01:30,920 Now, if we can draw a graph on a 2D surface 28 00:01:30,920 --> 00:01:34,360 without having any of its edges cross, 29 00:01:34,360 --> 00:01:36,603 we call such a graph planar. 30 00:01:37,520 --> 00:01:41,900 So, here in the left, we see K3 is clearly planar, 31 00:01:41,900 --> 00:01:45,220 because we can draw the three vertices, 32 00:01:45,220 --> 00:01:49,040 we can orient them in our drawing in such a way 33 00:01:49,040 --> 00:01:53,030 that the three edges here, none of the cross. 34 00:01:53,030 --> 00:01:55,257 Actually, it's impossible (speaking faintly). 35 00:01:57,423 --> 00:01:59,010 What about K4? 36 00:01:59,010 --> 00:02:01,960 Here we see edges crossing, 37 00:02:01,960 --> 00:02:05,520 but is this the only way to draw K4? 38 00:02:05,520 --> 00:02:10,320 Is it possible to draw K4 on a plane surface 39 00:02:10,320 --> 00:02:13,793 in such a way that none of the edges cross? 40 00:02:14,780 --> 00:02:17,690 And, you know, we could take this one edge here 41 00:02:17,690 --> 00:02:20,050 and stretch it to the outside 42 00:02:20,050 --> 00:02:23,783 and make a drawing that looks kinda like this. 43 00:02:24,760 --> 00:02:26,560 And that's okay, that works, 44 00:02:26,560 --> 00:02:30,403 so now we know that K4 is definitely planar. 45 00:02:31,580 --> 00:02:33,930 It's a little more elegant to draw it this way. 46 00:02:35,230 --> 00:02:37,730 It's sort of a tetrahedral kind of a relationship. 47 00:02:38,950 --> 00:02:43,510 So this is what we mean when we say a graph is planar, 48 00:02:43,510 --> 00:02:46,970 is that we can draw it in 2D 49 00:02:46,970 --> 00:02:49,023 without having any edges cross. 50 00:02:51,150 --> 00:02:54,900 K5, on the other hand, is definitely not planar 51 00:02:54,900 --> 00:02:59,773 and K5 turns up in a lot of proofs in graph theory. 52 00:03:01,858 --> 00:03:04,730 If K5 is found anywhere on a graph, 53 00:03:04,730 --> 00:03:06,560 then the graph is not planar. 54 00:03:06,560 --> 00:03:11,050 So we could hang all kinds of other nodes off of this. 55 00:03:11,050 --> 00:03:14,680 If anywhere on the graph there appears K5, 56 00:03:14,680 --> 00:03:16,877 the graph is not planar. 57 00:03:20,318 --> 00:03:23,680 Now, here's something different that you'll see in CS 224. 58 00:03:23,680 --> 00:03:26,430 Take a look at these two graphs and what do you notice? 59 00:03:31,860 --> 00:03:36,540 Well, A, B, and C form what's called an independent set, 60 00:03:36,540 --> 00:03:38,870 that is none of those vertices, 61 00:03:38,870 --> 00:03:41,000 or none of those nodes, I should say, 62 00:03:41,000 --> 00:03:42,950 shares an edge with any of the others. 63 00:03:42,950 --> 00:03:46,400 So we here we see A, B, C. 64 00:03:46,400 --> 00:03:50,000 A is connected to D, F, and E, 65 00:03:50,000 --> 00:03:52,300 but there's no edge that goes between A and C. 66 00:03:52,300 --> 00:03:55,100 There's no edge that goes between A and B, 67 00:03:55,100 --> 00:03:57,940 and there's no edge that goes between B and C. 68 00:03:57,940 --> 00:04:01,340 So we call that an independent set. 69 00:04:01,340 --> 00:04:03,620 And the same applies to D, E, and F. 70 00:04:03,620 --> 00:04:07,230 You'll see here D, E, and F are connected to A, B, and C, 71 00:04:07,230 --> 00:04:09,480 but there's no edge between F and E, 72 00:04:09,480 --> 00:04:10,940 no edge between F and D, 73 00:04:10,940 --> 00:04:13,433 and no edge between D and E. 74 00:04:14,320 --> 00:04:16,333 And to make this more clear, 75 00:04:18,000 --> 00:04:19,600 we get the picture on the right, 76 00:04:20,450 --> 00:04:23,830 where we've segregated the vertices here, 77 00:04:23,830 --> 00:04:25,220 A, B, and C on one side, 78 00:04:25,220 --> 00:04:28,040 and D, E, and F on the other side. 79 00:04:28,040 --> 00:04:30,500 Remember that these are exactly the same graph, 80 00:04:30,500 --> 00:04:32,120 they're just drawn differently. 81 00:04:32,120 --> 00:04:35,570 So the edge relationships are exactly the same 82 00:04:35,570 --> 00:04:36,490 in both cases. 83 00:04:36,490 --> 00:04:37,910 We're just drawing them differently 84 00:04:37,910 --> 00:04:42,070 so that we can visualize a particular characteristic 85 00:04:42,070 --> 00:04:42,903 of this graph. 86 00:04:43,930 --> 00:04:48,400 And so we have this set containing A, B, and C, 87 00:04:48,400 --> 00:04:50,240 this independent set, 88 00:04:50,240 --> 00:04:54,440 and we have this independent set containing D, E, and F. 89 00:04:54,440 --> 00:04:58,010 Members of these sets share no common edges. 90 00:04:58,010 --> 00:05:02,170 And graphs like this we call bipartite graphs 91 00:05:02,170 --> 00:05:05,320 and we've recently learned about equivalence relations 92 00:05:05,320 --> 00:05:08,150 and it turns out that belonging to an independent set 93 00:05:08,150 --> 00:05:09,573 is an equivalence relation. 94 00:05:10,950 --> 00:05:12,650 And where do these turn up? 95 00:05:12,650 --> 00:05:15,890 Well, bipartite graphs have broad application 96 00:05:15,890 --> 00:05:17,530 and there's all kinds of algorithms 97 00:05:17,530 --> 00:05:18,863 that operate on them. 98 00:05:19,710 --> 00:05:22,210 And one common application is called matching, 99 00:05:22,210 --> 00:05:24,800 and I won't go into too much detail here, 100 00:05:24,800 --> 00:05:28,410 but here's one example of a matching. 101 00:05:28,410 --> 00:05:30,810 On one side, we have high school students, 102 00:05:30,810 --> 00:05:33,080 and on the other side we have the colleges 103 00:05:33,080 --> 00:05:34,630 to which they've been admitted. 104 00:05:37,010 --> 00:05:39,650 By the same token, we could have the same students 105 00:05:39,650 --> 00:05:41,490 and the things to which they're allergic. 106 00:05:41,490 --> 00:05:44,290 So one class of algorithms is designed 107 00:05:44,290 --> 00:05:47,600 to find matchings in a graph, 108 00:05:47,600 --> 00:05:49,800 and another class of algorithms is intended 109 00:05:49,800 --> 00:05:53,450 to produce matchings with certain desirable properties. 110 00:05:53,450 --> 00:05:56,580 For example, the famous Gale-Shapley algorithm 111 00:05:58,090 --> 00:05:59,850 produces stable matchings 112 00:05:59,850 --> 00:06:03,970 and it's used to assign medical students 113 00:06:05,239 --> 00:06:06,843 to medical colleges. 114 00:06:08,230 --> 00:06:12,080 But let's conclude this little sneak preview 115 00:06:12,080 --> 00:06:13,440 and move on to the problem 116 00:06:13,440 --> 00:06:16,853 of how we represent a graph with some data structure. 117 00:06:17,710 --> 00:06:20,400 Now, in the other lecture, 118 00:06:20,400 --> 00:06:25,400 I introduced the adjacency list 119 00:06:25,830 --> 00:06:27,620 and the adjacency matrix 120 00:06:27,620 --> 00:06:29,750 and I mentioned that there was such a thing called 121 00:06:29,750 --> 00:06:31,340 an incidence matrix, 122 00:06:31,340 --> 00:06:33,400 but we didn't go into it there. 123 00:06:33,400 --> 00:06:35,150 We're going to go into it here. 124 00:06:35,150 --> 00:06:37,530 So we're gonna look at this one more data structure, 125 00:06:37,530 --> 00:06:39,330 the incidence matrix. 126 00:06:39,330 --> 00:06:41,700 Now, remember the meaning of incidence. 127 00:06:41,700 --> 00:06:43,630 An edge is incident to a vertex 128 00:06:43,630 --> 00:06:47,010 if the vertex is one of its endpoints, I should say node. 129 00:06:47,010 --> 00:06:47,900 Let me say that again. 130 00:06:47,900 --> 00:06:50,590 An edge is incident to a node 131 00:06:50,590 --> 00:06:53,430 if the node is one of its endpoints. 132 00:06:53,430 --> 00:06:56,360 And the incidence matrix requires 133 00:06:56,360 --> 00:06:59,650 that we label edges and nodes. 134 00:06:59,650 --> 00:07:02,830 And so here you'll see, in the diagram on the left, 135 00:07:02,830 --> 00:07:07,830 we have edge labels e1, e2, e3, and so on through e8. 136 00:07:09,410 --> 00:07:10,960 And don't confuse these with weights. 137 00:07:10,960 --> 00:07:11,940 There are just labels. 138 00:07:11,940 --> 00:07:15,020 We could call them AB, BC, and so on, 139 00:07:15,020 --> 00:07:18,320 but I've adopted this notation 140 00:07:18,320 --> 00:07:19,640 for this particular incidence, 141 00:07:19,640 --> 00:07:23,013 because I think it's more concise and elegant. 142 00:07:23,980 --> 00:07:25,140 So these are just labels. 143 00:07:25,140 --> 00:07:27,980 These are labeling this edge and this edge and this edge, 144 00:07:27,980 --> 00:07:30,753 just like we have labels on our nodes. 145 00:07:31,890 --> 00:07:34,100 So what we do in an incidence matrix 146 00:07:34,100 --> 00:07:38,720 is that we have the edges on one axis, 147 00:07:38,720 --> 00:07:42,150 and these numbers correspond to the subscripts here 148 00:07:42,150 --> 00:07:43,520 of each of these edges. 149 00:07:43,520 --> 00:07:48,520 So this column is going to tell us things about e1 150 00:07:48,530 --> 00:07:51,723 and this column is going to tell us things about e2. 151 00:07:53,080 --> 00:07:58,080 And so we have nodes on the other axis, 152 00:07:58,270 --> 00:08:00,360 these are the node labels, 153 00:08:00,360 --> 00:08:04,430 and what we do is we put a one in the matrix 154 00:08:04,430 --> 00:08:08,670 if a node is an endpoint of an edge, and zero otherwise. 155 00:08:08,670 --> 00:08:12,500 So again, this is our matrix notation 156 00:08:12,500 --> 00:08:14,380 with v and e as subscripts. 157 00:08:14,380 --> 00:08:17,780 V is the vertices, e is the edges. 158 00:08:17,780 --> 00:08:19,473 So let's see how that works. 159 00:08:21,490 --> 00:08:23,670 So here we'll proceed by columns, 160 00:08:23,670 --> 00:08:25,610 each column representing an edge. 161 00:08:25,610 --> 00:08:29,300 So let's start with edge one. 162 00:08:29,300 --> 00:08:30,980 So here we have edge one 163 00:08:30,980 --> 00:08:35,980 and we see that it's incident to A and B, 164 00:08:36,040 --> 00:08:39,530 and so for A, we put a one, for B, we put a one, 165 00:08:39,530 --> 00:08:42,863 and we put zeros on all the other positions. 166 00:08:43,740 --> 00:08:47,620 For edge two, edge two is incident to A 167 00:08:47,620 --> 00:08:49,200 and incident to D, 168 00:08:49,200 --> 00:08:51,870 so we put a one next to A here 169 00:08:51,870 --> 00:08:54,430 and a one here for D 170 00:08:54,430 --> 00:08:56,730 and all the others are zeros. 171 00:08:56,730 --> 00:09:01,143 And similarly for the rest of the incidence matrix. 172 00:09:04,350 --> 00:09:08,170 And so finally, we end up at edge eight, 173 00:09:08,170 --> 00:09:10,230 which is incident to E and F. 174 00:09:10,230 --> 00:09:12,730 You see those down in the lower right-hand corner. 175 00:09:15,450 --> 00:09:17,010 Now, you may wonder what we do 176 00:09:17,010 --> 00:09:18,800 in the case of a directed graph. 177 00:09:18,800 --> 00:09:23,433 Well, in a directed graph, we use negative one. 178 00:09:24,400 --> 00:09:28,540 We use negative one if edge e leaves node v, 179 00:09:28,540 --> 00:09:32,490 and we use positive one if edge e enters node v, 180 00:09:32,490 --> 00:09:33,810 or zero otherwise. 181 00:09:33,810 --> 00:09:37,490 So here we have edge e1 182 00:09:37,490 --> 00:09:41,740 and we would put a negative one 183 00:09:41,740 --> 00:09:45,000 as the entry corresponding to e1 and A, 184 00:09:45,000 --> 00:09:48,220 and a positive one for the entry corresponding 185 00:09:48,220 --> 00:09:49,910 to e1 and B, 186 00:09:49,910 --> 00:09:54,860 because e1 leaves A and enters B. 187 00:09:55,740 --> 00:09:57,660 Okay, so let's see what 188 00:09:57,660 --> 00:10:00,360 the incidence matrix looks like for those. 189 00:10:00,360 --> 00:10:01,900 Again, we'll proceed by columns, 190 00:10:01,900 --> 00:10:04,530 with each column representing an edge. 191 00:10:04,530 --> 00:10:07,200 So you see here e1, it leaves A, 192 00:10:07,200 --> 00:10:09,320 so we've got a negative one here. 193 00:10:09,320 --> 00:10:11,760 It enters B, so we have a positive one here, 194 00:10:11,760 --> 00:10:13,660 and all the rest are zeros. 195 00:10:13,660 --> 00:10:17,413 Have to be because each edge has exactly two components. 196 00:10:19,830 --> 00:10:21,290 And then the same thing. 197 00:10:21,290 --> 00:10:25,220 Here's e2, it leaves D and enters A, 198 00:10:25,220 --> 00:10:27,080 so there's a minus one here 199 00:10:27,080 --> 00:10:29,500 and a positive one here. 200 00:10:29,500 --> 00:10:33,083 And similarly for the rest of the incidence matrix. 201 00:10:34,990 --> 00:10:37,480 Now, notice here that the columns 202 00:10:37,480 --> 00:10:39,580 in the case of a directed graph, 203 00:10:39,580 --> 00:10:43,210 the columns of an incidence matrix will sum to zero 204 00:10:44,476 --> 00:10:47,190 and each row gives the net input to a node. 205 00:10:47,190 --> 00:10:48,640 So if we take a look 206 00:10:48,640 --> 00:10:52,310 and we take the sum here for node A, 207 00:10:52,310 --> 00:10:55,320 we see we have minus one, one, zero, 208 00:10:55,320 --> 00:11:00,013 so there's an equivalent number of in edges and out edges. 209 00:11:01,160 --> 00:11:03,800 This is not true for B, for example, 210 00:11:03,800 --> 00:11:05,760 we see that we have a net of one. 211 00:11:05,760 --> 00:11:09,007 We have one, negative one, and one. 212 00:11:09,007 --> 00:11:14,007 And so that tells us that B has one more in edge 213 00:11:14,810 --> 00:11:16,040 than it has out edges. 214 00:11:16,040 --> 00:11:18,610 And if we check, yeah, that's exactly the case. 215 00:11:18,610 --> 00:11:22,110 We see we have two in edges and one out edge. 216 00:11:22,110 --> 00:11:24,270 So those are some interesting properties 217 00:11:24,270 --> 00:11:26,220 of the incidence matrix. 218 00:11:26,220 --> 00:11:28,180 And I won't go into the applications here, 219 00:11:28,180 --> 00:11:30,870 I just wanted to show you the incidence matrix, 220 00:11:30,870 --> 00:11:33,540 that there are other ways of representing a graph 221 00:11:34,800 --> 00:11:37,680 than just the two that we saw in the previous lecture, 222 00:11:37,680 --> 00:11:40,770 but there are a number of applications 223 00:11:40,770 --> 00:11:43,920 for which incidence matrix provides 224 00:11:43,920 --> 00:11:47,290 the more efficient data structure. 225 00:11:47,290 --> 00:11:52,290 In general, the complexity of the incidence matrix 226 00:11:54,150 --> 00:11:58,571 is big O of V times the cardinality of V 227 00:11:58,571 --> 00:12:00,193 times the cardinality of E. 228 00:12:01,770 --> 00:12:05,070 And again, it has some interesting mathematical properties 229 00:12:05,070 --> 00:12:07,530 and it's used in more advanced applications, 230 00:12:07,530 --> 00:12:11,150 such as polytopes and problems in combinatorics. 231 00:12:11,150 --> 00:12:14,470 And if you don't know what polytopes are and combinatorics, 232 00:12:14,470 --> 00:12:16,460 don't worry now, 233 00:12:16,460 --> 00:12:21,460 but I hope that you see these things in later classes. 234 00:12:21,990 --> 00:12:24,450 And that concludes this little sneak preview 235 00:12:24,450 --> 00:12:28,143 for the graph theory that you might see in CS 224. 236 00:12:29,440 --> 00:12:30,273 Thanks.