WEBVTT 1 00:00:00.270 --> 00:00:03.060 All right, picking the right chart. 2 00:00:03.060 --> 00:00:05.010 And by the way, I should probably start this off 3 00:00:05.010 --> 00:00:07.530 by saying that's a terrible title. 4 00:00:07.530 --> 00:00:08.867 That's not what we should be calling this 5 00:00:08.867 --> 00:00:10.500 'cause we don't just pick charts. 6 00:00:10.500 --> 00:00:12.060 I mean, we do. 7 00:00:12.060 --> 00:00:13.320 We do just pick charts. 8 00:00:13.320 --> 00:00:15.810 Meaning that we're in our software, 9 00:00:15.810 --> 00:00:17.730 we gotta get this stupid presentation done 10 00:00:17.730 --> 00:00:19.140 or whatever it is we're creating. 11 00:00:19.140 --> 00:00:20.040 We just gotta pick a chart 12 00:00:20.040 --> 00:00:21.240 and move on with our lives, right? 13 00:00:21.240 --> 00:00:24.210 That's how we actually do tend to do it. 14 00:00:24.210 --> 00:00:26.130 But what we're doing, what we should be doing, 15 00:00:26.130 --> 00:00:29.700 is thinking about our data, the context, 16 00:00:29.700 --> 00:00:31.440 what we're trying to accomplish, 17 00:00:31.440 --> 00:00:34.290 and selecting the best visual to accomplish that. 18 00:00:34.290 --> 00:00:36.660 So, you know, maybe a little bit of semantics here 19 00:00:36.660 --> 00:00:38.610 but I don't like that phrasing. 20 00:00:38.610 --> 00:00:39.870 But I use it because, you know, 21 00:00:39.870 --> 00:00:41.910 obviously everybody knows what I mean when I say it. 22 00:00:41.910 --> 00:00:44.550 So, picking charts, yes, let's talk about it. 23 00:00:44.550 --> 00:00:46.650 How should we do that? 24 00:00:46.650 --> 00:00:50.670 Well, what we have to realize and acknowledge 25 00:00:50.670 --> 00:00:52.770 is that to do it properly, 26 00:00:52.770 --> 00:00:56.040 we have to consider a whole bunch of different questions 27 00:00:56.040 --> 00:00:58.110 and we gotta focus on the right questions 28 00:00:58.110 --> 00:01:00.330 that we should be asking ourselves. 29 00:01:00.330 --> 00:01:03.720 Starting off with the foundational questions 30 00:01:03.720 --> 00:01:05.400 which we've already talked about. 31 00:01:05.400 --> 00:01:07.230 We have to focus in on our KWYs, right? 32 00:01:07.230 --> 00:01:08.550 I keep saying it. 33 00:01:08.550 --> 00:01:09.630 What's the data saying? 34 00:01:09.630 --> 00:01:10.920 What type of data do I have? 35 00:01:10.920 --> 00:01:11.940 What's the shape of the data? 36 00:01:11.940 --> 00:01:13.740 How many variables are there? 37 00:01:13.740 --> 00:01:17.490 That's gonna affect different charts being possible, right? 38 00:01:17.490 --> 00:01:19.770 If I have 14 variables to express, 39 00:01:19.770 --> 00:01:21.150 I'm probably not gonna use a bar chart 40 00:01:21.150 --> 00:01:22.590 which is all really only good at one or two 41 00:01:22.590 --> 00:01:23.763 at the most, okay? 42 00:01:24.660 --> 00:01:26.880 What does my audience need to hear? 43 00:01:26.880 --> 00:01:29.310 Obviously gonna affect how I choose charts, 44 00:01:29.310 --> 00:01:31.830 the type of task I'm enabling them to do. 45 00:01:31.830 --> 00:01:33.030 We'll talk about that. 46 00:01:33.030 --> 00:01:34.230 And what do I really want to say, 47 00:01:34.230 --> 00:01:35.910 which is closely aligned with that. 48 00:01:35.910 --> 00:01:38.190 So these are the foundational questions. 49 00:01:38.190 --> 00:01:42.210 When we are visualizing data, we need to have tunnel vision 50 00:01:42.210 --> 00:01:45.180 on those questions and others. 51 00:01:45.180 --> 00:01:49.920 So, when we're specifically identifying the chart, 52 00:01:49.920 --> 00:01:52.020 which chart is gonna do the job, 53 00:01:52.020 --> 00:01:54.360 one of the key questions that we can focus in on 54 00:01:54.360 --> 00:01:56.467 very, very tightly is, 55 00:01:56.467 --> 00:01:59.610 "What do we want our audience to see?" 56 00:01:59.610 --> 00:02:02.257 And that's how we tend to think about it. 57 00:02:02.257 --> 00:02:04.260 "What am I showing them?" 58 00:02:04.260 --> 00:02:06.960 Yeah, that's part of it. 59 00:02:06.960 --> 00:02:08.827 But it's also very critically about, 60 00:02:08.827 --> 00:02:11.190 "What do I want my audience to do?" 61 00:02:11.190 --> 00:02:15.330 It's about that action, that task, excuse me, 62 00:02:15.330 --> 00:02:18.720 that you're gonna enable them to perform. 63 00:02:18.720 --> 00:02:21.450 Now it's a subtle difference and, you know, 64 00:02:21.450 --> 00:02:22.980 maybe you don't always think about it that way 65 00:02:22.980 --> 00:02:26.010 or maybe you always think about it as tasks, which is great. 66 00:02:26.010 --> 00:02:28.620 The fact is, those are different. 67 00:02:28.620 --> 00:02:30.120 I want my audience to see something 68 00:02:30.120 --> 00:02:32.640 is different from I want them to do something. 69 00:02:32.640 --> 00:02:35.040 So what kinds of things do we want our audience 70 00:02:35.040 --> 00:02:37.560 to do and or see? 71 00:02:37.560 --> 00:02:42.150 They kind of come in these broad categories. 72 00:02:42.150 --> 00:02:44.520 And this probably isn't a complete list. 73 00:02:44.520 --> 00:02:46.080 There are probably others that I'm not thinking of 74 00:02:46.080 --> 00:02:48.689 at the moment, but for the most part 75 00:02:48.689 --> 00:02:51.720 these are the things, okay? 76 00:02:51.720 --> 00:02:54.307 So then you might ask yourself, 77 00:02:54.307 --> 00:02:58.434 "All right when would I choose to do one of those? 78 00:02:58.434 --> 00:03:00.510 When is distribution the right thing to show them? 79 00:03:00.510 --> 00:03:01.980 Or correlation or whatever, whatever, 80 00:03:01.980 --> 00:03:04.770 if those are gonna be the guides for the charts I select?" 81 00:03:04.770 --> 00:03:06.690 And by the way, we'll talk about which charts you can use 82 00:03:06.690 --> 00:03:08.913 in those circumstances, later on. 83 00:03:10.710 --> 00:03:13.230 Some of these are simpler than others 84 00:03:13.230 --> 00:03:15.720 but why don't we walk through them one by one, okay? 85 00:03:15.720 --> 00:03:20.720 So the universe of categories, broadly, are these, 86 00:03:20.880 --> 00:03:22.430 starting off with a comparison. 87 00:03:23.460 --> 00:03:28.200 Comparison is not always the job. 88 00:03:28.200 --> 00:03:32.010 It's not always the task at hand, but sometimes it is. 89 00:03:32.010 --> 00:03:36.300 Sometimes I need you to understand that this is bigger 90 00:03:36.300 --> 00:03:39.690 than this and it's about two times the size, right? 91 00:03:39.690 --> 00:03:42.000 So like in the Olympics example that I used 92 00:03:42.000 --> 00:03:43.620 during the "So What?" module, you know, a couple, 93 00:03:43.620 --> 00:03:45.240 few weeks ago, right? 94 00:03:45.240 --> 00:03:48.270 It was all about comparing the average age 95 00:03:48.270 --> 00:03:49.320 of different sports, 96 00:03:49.320 --> 00:03:51.360 different competitors in different sports. 97 00:03:51.360 --> 00:03:53.400 The comparison was part of it. 98 00:03:53.400 --> 00:03:54.960 But there are other tasks where comparison 99 00:03:54.960 --> 00:03:56.430 is not part of it. 100 00:03:56.430 --> 00:03:59.100 For instance, maybe it's really important 101 00:03:59.100 --> 00:04:00.990 to talk about trend or correlation. 102 00:04:00.990 --> 00:04:02.040 These are different tasks. 103 00:04:02.040 --> 00:04:04.140 Sometimes you're doing more than one at the same time. 104 00:04:04.140 --> 00:04:05.430 Okay, fair enough. 105 00:04:05.430 --> 00:04:09.120 Now if comparison is my task, 106 00:04:09.120 --> 00:04:11.940 if that's the number one goal for my audience, 107 00:04:11.940 --> 00:04:14.880 then I'm gonna give 'em a chart that enables comparison. 108 00:04:14.880 --> 00:04:18.000 Bar charts are a chart, not the only one, 109 00:04:18.000 --> 00:04:19.950 they are a chart that enables comparison. 110 00:04:19.950 --> 00:04:22.860 We are very good at doing distinct value comparison 111 00:04:22.860 --> 00:04:27.180 with good precision by when we compare those bars, okay? 112 00:04:27.180 --> 00:04:29.250 So I'm not gonna use a bar chart for a task 113 00:04:29.250 --> 00:04:32.100 it does not perform well, but this is one where it does. 114 00:04:32.100 --> 00:04:33.090 And again, there are alternatives. 115 00:04:33.090 --> 00:04:34.500 We'll talk more about that. 116 00:04:34.500 --> 00:04:37.650 Sometimes I really want my audience to understand 117 00:04:37.650 --> 00:04:40.050 the trend over time, right? 118 00:04:40.050 --> 00:04:42.420 So it's not about comparing where we ended 119 00:04:42.420 --> 00:04:46.380 and where we started, it's more about that flow over time. 120 00:04:46.380 --> 00:04:48.180 When that's the job, I'm gonna pick a chart 121 00:04:48.180 --> 00:04:49.110 that does that well. 122 00:04:49.110 --> 00:04:51.600 For instance, a line chart. 123 00:04:51.600 --> 00:04:54.510 I could visualize that same data with bars, right? 124 00:04:54.510 --> 00:04:56.460 If this is like monthly data, 125 00:04:56.460 --> 00:04:58.500 I could do a bunch of monthly bars. 126 00:04:58.500 --> 00:05:01.680 But we know from research, if I show bars, 127 00:05:01.680 --> 00:05:04.170 I'm more likely to think about comparison. 128 00:05:04.170 --> 00:05:06.810 If I see a line I'm more likely to think about 129 00:05:06.810 --> 00:05:08.220 trend and flow. 130 00:05:08.220 --> 00:05:10.920 So we're choosing charts for a reason. 131 00:05:10.920 --> 00:05:12.810 Next one up is composition. 132 00:05:12.810 --> 00:05:16.230 Sometimes, not always, it's really important 133 00:05:16.230 --> 00:05:18.030 for my audience to understand 134 00:05:18.030 --> 00:05:21.690 the proportional share of something. 135 00:05:21.690 --> 00:05:24.540 The part to whole relationship, the composition. 136 00:05:24.540 --> 00:05:25.470 There are different words. 137 00:05:25.470 --> 00:05:27.030 The segmentation. 138 00:05:27.030 --> 00:05:29.070 Different ways of expressing this. 139 00:05:29.070 --> 00:05:32.490 So part of data visualization is about translating 140 00:05:32.490 --> 00:05:33.630 what it is that we're talking about 141 00:05:33.630 --> 00:05:35.100 into the actual words that help us 142 00:05:35.100 --> 00:05:38.010 identify the charts that perform those things. 143 00:05:38.010 --> 00:05:39.540 We talked about that in week zero 144 00:05:39.540 --> 00:05:41.490 during data analytics, right? 145 00:05:41.490 --> 00:05:42.450 So there are a bunch of words for this 146 00:05:42.450 --> 00:05:44.610 but the basic idea is if I really want you to understand 147 00:05:44.610 --> 00:05:47.790 that it's about 50% or it's about two thirds or whatever, 148 00:05:47.790 --> 00:05:49.410 I'm gonna use a chart that enables that, 149 00:05:49.410 --> 00:05:54.410 such as a pie chart or a stacked column chart, 150 00:05:55.320 --> 00:05:58.080 or, those aren't the only ones, right? 151 00:05:58.080 --> 00:05:59.010 There are a bunch of them. 152 00:05:59.010 --> 00:06:03.030 One of them that I love is the waffle plot or the grid plot. 153 00:06:03.030 --> 00:06:04.980 This is where I have like a hundred items, 154 00:06:04.980 --> 00:06:06.360 it doesn't have to be a hundred, but, you know, 155 00:06:06.360 --> 00:06:09.840 a bunch of individual items, and I use color 156 00:06:09.840 --> 00:06:12.840 and I could use icons, I can use various things 157 00:06:12.840 --> 00:06:15.000 to show that segmentation. 158 00:06:15.000 --> 00:06:17.910 The nice thing about waffle plots is you're not limited 159 00:06:17.910 --> 00:06:20.250 to just three to five categories 160 00:06:20.250 --> 00:06:22.500 like you are with pie charts. 161 00:06:22.500 --> 00:06:25.740 You can also see the smaller segments, like 1%, 162 00:06:25.740 --> 00:06:29.430 that 1% icon there for, what is it, 163 00:06:29.430 --> 00:06:31.740 for institutionalized, I believe. 164 00:06:31.740 --> 00:06:32.700 I can see that. 165 00:06:32.700 --> 00:06:34.860 Whereas if that was a teeny little slice 166 00:06:34.860 --> 00:06:37.800 in a pie chart, it's gonna practically disappear. 167 00:06:37.800 --> 00:06:39.690 This is also called a unit diagram. 168 00:06:39.690 --> 00:06:41.670 There are a bunch of different types of unit diagrams 169 00:06:41.670 --> 00:06:43.590 but the advantage of unit diagrams 170 00:06:43.590 --> 00:06:46.440 is you have this one object that means one, 171 00:06:46.440 --> 00:06:48.330 or a hundred, or a thousand, or whatever it is, 172 00:06:48.330 --> 00:06:51.180 which makes it really easy to just sort of see the math. 173 00:06:51.180 --> 00:06:52.860 A wedge that's this big compared to the one 174 00:06:52.860 --> 00:06:56.130 that's that big, I don't know what I'm looking at 175 00:06:56.130 --> 00:06:57.990 or how to compare the two. 176 00:06:57.990 --> 00:07:00.360 Here, I can see institutionalized, 177 00:07:00.360 --> 00:07:02.520 actually it's active duty military that's 1%. 178 00:07:02.520 --> 00:07:06.630 Active duty military, 1%, institutionalized, 2%. 179 00:07:06.630 --> 00:07:08.430 I can literally see it, "Oh that's one icon. 180 00:07:08.430 --> 00:07:09.750 The other one's two icons." 181 00:07:09.750 --> 00:07:12.120 It's easy for me to figure that stuff out. 182 00:07:12.120 --> 00:07:14.310 Okay, distribution. 183 00:07:14.310 --> 00:07:18.750 Sometimes, not always, but sometimes it's really important 184 00:07:18.750 --> 00:07:21.690 for me to show my audience that distribution. 185 00:07:21.690 --> 00:07:24.360 And we talked about this also in week zero. 186 00:07:24.360 --> 00:07:27.120 Distribution lets us see the shape of the data. 187 00:07:27.120 --> 00:07:29.010 Is there clustering at one end or the other? 188 00:07:29.010 --> 00:07:30.570 Are there weird outliers? 189 00:07:30.570 --> 00:07:33.006 Is it very evenly distributed? 190 00:07:33.006 --> 00:07:35.550 What's the range of values? 191 00:07:35.550 --> 00:07:38.040 That's when I would want to use the distribution diagram. 192 00:07:38.040 --> 00:07:39.840 And there are a whole bunch of those. 193 00:07:39.840 --> 00:07:42.810 One of them is a box plot. 194 00:07:42.810 --> 00:07:45.210 This is the weather page in "The Wall Street Journal." 195 00:07:45.210 --> 00:07:48.030 Fantastic for that audience. 196 00:07:48.030 --> 00:07:50.640 That audience craves this much detail. 197 00:07:50.640 --> 00:07:52.350 And I think we might talk about this one 198 00:07:52.350 --> 00:07:53.370 a little bit more later. 199 00:07:53.370 --> 00:07:54.203 But long story short, 200 00:07:54.203 --> 00:07:55.380 there are a bunch of distribution diagrams. 201 00:07:55.380 --> 00:07:58.080 We talked about some of them in week zero. 202 00:07:58.080 --> 00:08:00.960 Okay, so, correlation. 203 00:08:00.960 --> 00:08:03.720 Also a very specific task. 204 00:08:03.720 --> 00:08:06.060 Correlation is when I want you to know 205 00:08:06.060 --> 00:08:08.550 that these two variables are moving in the same direction 206 00:08:08.550 --> 00:08:10.410 or in opposite directions. 207 00:08:10.410 --> 00:08:13.680 It's about the relationship between those two variables. 208 00:08:13.680 --> 00:08:15.390 If that is what I need you to understand, 209 00:08:15.390 --> 00:08:17.310 which is, by the way, sometimes I want to do that 210 00:08:17.310 --> 00:08:18.577 when I want to sort of show, 211 00:08:18.577 --> 00:08:21.330 "Hey I'm trying to figure out what goes with, you know 212 00:08:21.330 --> 00:08:23.400 let's say winning athletes in the Olympics." 213 00:08:23.400 --> 00:08:26.610 Well they tend to be younger or they tend to be older, 214 00:08:26.610 --> 00:08:29.040 or they tend to be right-handed or left-handed, 215 00:08:29.040 --> 00:08:31.410 or they like burritos versus tacos. 216 00:08:31.410 --> 00:08:34.380 I would look for a correlation between winning this 217 00:08:34.380 --> 00:08:37.290 and burrito consumption, as an example, right? 218 00:08:37.290 --> 00:08:38.820 That's correlation. 219 00:08:38.820 --> 00:08:40.440 That's why we would use correlation. 220 00:08:40.440 --> 00:08:43.200 And when I'm gonna look for correlation, again, 221 00:08:43.200 --> 00:08:44.730 I'm gonna use a chart that enables that. 222 00:08:44.730 --> 00:08:46.230 For instance, the scatter plot. 223 00:08:46.230 --> 00:08:48.770 The number one best way, proven by research, 224 00:08:48.770 --> 00:08:51.540 to show correlation. 225 00:08:51.540 --> 00:08:55.740 For analysis, by the way, as well as for communication. 226 00:08:55.740 --> 00:08:57.390 Next up is deviation. 227 00:08:57.390 --> 00:09:02.370 Sometimes it's not about the data value itself 228 00:09:02.370 --> 00:09:05.310 and it's also not about the benchmark 229 00:09:05.310 --> 00:09:07.530 or whatever it's being compared to. 230 00:09:07.530 --> 00:09:09.870 It's actually about that space in between. 231 00:09:09.870 --> 00:09:12.660 And I'll show an example of that later on in this module. 232 00:09:12.660 --> 00:09:15.480 In that case you would use a deviation chart, 233 00:09:15.480 --> 00:09:17.670 such as this deviating area chart. 234 00:09:17.670 --> 00:09:21.210 It's about emphasizing that space in between the two things 235 00:09:21.210 --> 00:09:23.880 because the size and the breadth of that space 236 00:09:23.880 --> 00:09:25.257 is sort of where the insight is. 237 00:09:25.257 --> 00:09:27.660 And like I said, I'll show an example of that. 238 00:09:27.660 --> 00:09:29.310 Sometimes it's about flow. 239 00:09:29.310 --> 00:09:31.410 This becomes this becomes this. 240 00:09:31.410 --> 00:09:33.960 Or these two categories sort of swap in importance 241 00:09:33.960 --> 00:09:35.640 and become something else. 242 00:09:35.640 --> 00:09:39.450 Or in like accounting, when we add values 243 00:09:39.450 --> 00:09:43.500 and then subtract values, this is called a waterfall chart. 244 00:09:43.500 --> 00:09:47.370 We can sort of show how we get from zero to a total by, 245 00:09:47.370 --> 00:09:48.570 you know, adding and subtracting 246 00:09:48.570 --> 00:09:51.390 using this particular chart type. 247 00:09:51.390 --> 00:09:53.550 There's also sometimes where spatial information 248 00:09:53.550 --> 00:09:55.590 is really important, and the GIS students 249 00:09:55.590 --> 00:09:57.360 will certainly recognize this. 250 00:09:57.360 --> 00:09:59.130 Sometimes it's all about that geographic 251 00:09:59.130 --> 00:10:02.430 spatial relationship, in which case obviously a map 252 00:10:02.430 --> 00:10:04.320 is one of the options you might use. 253 00:10:04.320 --> 00:10:07.140 And then finally we have networks. 254 00:10:07.140 --> 00:10:10.410 It's about the interconnectedness of stuff. 255 00:10:10.410 --> 00:10:13.290 So in those cases we use network diagrams, 256 00:10:13.290 --> 00:10:16.037 node link diagrams being one of the examples. 257 00:10:16.037 --> 00:10:18.570 Network diagramming, network visualization, 258 00:10:18.570 --> 00:10:21.390 is particularly complex. 259 00:10:21.390 --> 00:10:23.760 Sometimes you end up with these hairball diagrams 260 00:10:23.760 --> 00:10:27.120 of just craziness, which can be informative, 261 00:10:27.120 --> 00:10:29.490 but they can also be very weird. 262 00:10:29.490 --> 00:10:32.700 It can be its own field of study on its own, I think. 263 00:10:32.700 --> 00:10:35.370 I actually find myself very rarely using network diagrams 264 00:10:35.370 --> 00:10:38.310 because they can be so fraught. 265 00:10:38.310 --> 00:10:40.110 But they're pretty interesting when you can get 'em 266 00:10:40.110 --> 00:10:42.030 to do powerful stuff. 267 00:10:42.030 --> 00:10:44.730 So it's all about enabling tasks. 268 00:10:44.730 --> 00:10:48.690 And we talked about this a little bit in week four 269 00:10:48.690 --> 00:10:50.250 when we talked about maps, right? 270 00:10:50.250 --> 00:10:52.680 And I showed these different maps of the election 271 00:10:52.680 --> 00:10:55.170 and how each of these maps is valid. 272 00:10:55.170 --> 00:10:57.090 Each one enables a different task. 273 00:10:57.090 --> 00:11:00.420 Right here I can see which county won, you know, 274 00:11:00.420 --> 00:11:01.620 for which candidate. 275 00:11:01.620 --> 00:11:03.240 Here I can see which county won 276 00:11:03.240 --> 00:11:06.660 and the intensity of that win plus what the actual impact is 277 00:11:06.660 --> 00:11:07.680 because of population. 278 00:11:07.680 --> 00:11:11.070 Here I can see statewide impact due to population 279 00:11:11.070 --> 00:11:12.330 and which person won. 280 00:11:12.330 --> 00:11:15.210 And this one of course more about the blending, you know, 281 00:11:15.210 --> 00:11:19.590 enabling me to see how mixed and how we're not quite 282 00:11:19.590 --> 00:11:21.930 as polarized as we might think. 283 00:11:21.930 --> 00:11:25.110 Different tasks enables different insights, 284 00:11:25.110 --> 00:11:27.030 tells different stories, right? 285 00:11:27.030 --> 00:11:28.800 So, here's the thing. 286 00:11:28.800 --> 00:11:31.650 When we're creating visualizations, 287 00:11:31.650 --> 00:11:34.530 yes, it is about picking charts, 288 00:11:34.530 --> 00:11:37.020 but it's about other stuff also, okay? 289 00:11:37.020 --> 00:11:39.810 And we're gonna talk about that other stuff later. 290 00:11:39.810 --> 00:11:43.740 But I do wanna say when it is time to pick charts 291 00:11:43.740 --> 00:11:46.650 based on those tasks that we were just talking about, 292 00:11:46.650 --> 00:11:48.720 you're not alone, okay? 293 00:11:48.720 --> 00:11:51.462 I'm gonna share with you two cheat sheets 294 00:11:51.462 --> 00:11:55.590 that you should absolutely download and check out. 295 00:11:55.590 --> 00:11:57.540 Every time, in fact, you're doing data visualization 296 00:11:57.540 --> 00:12:00.120 you should look at these and be inspired, 297 00:12:00.120 --> 00:12:01.770 reminded about what's out there. 298 00:12:01.770 --> 00:12:03.810 This one being the Visual Vocabulary 299 00:12:03.810 --> 00:12:04.837 from "The Financial Times." 300 00:12:04.837 --> 00:12:08.190 "Financial Times" does great data visualization work. 301 00:12:08.190 --> 00:12:10.140 If you need to show change over time, 302 00:12:10.140 --> 00:12:12.240 you can see the green column there, 303 00:12:12.240 --> 00:12:14.190 those are some of the options for you. 304 00:12:14.190 --> 00:12:15.450 This is not a complete list. 305 00:12:15.450 --> 00:12:17.250 There is a bazillion charts out there, 306 00:12:17.250 --> 00:12:19.170 but this is a really solid list of options 307 00:12:19.170 --> 00:12:20.910 if you wanna show change over time. 308 00:12:20.910 --> 00:12:22.590 If you wanna show distribution, 309 00:12:22.590 --> 00:12:25.980 there are some good distribution diagrams, et cetera. 310 00:12:25.980 --> 00:12:29.940 Now, one thing about the Visual Vocabulary 311 00:12:29.940 --> 00:12:33.870 is you'll see the two columns, ranking and magnitude. 312 00:12:33.870 --> 00:12:36.150 Those are separate things. 313 00:12:36.150 --> 00:12:37.327 Sometimes I just want to show you, 314 00:12:37.327 --> 00:12:39.570 "Look, this is number one, this is number two, 315 00:12:39.570 --> 00:12:41.070 this is number three. 316 00:12:41.070 --> 00:12:43.410 And listen, the score for number one might have been 317 00:12:43.410 --> 00:12:45.420 800 times as high as the score for number two, 318 00:12:45.420 --> 00:12:46.380 but I'm not emphasizing that. 319 00:12:46.380 --> 00:12:49.770 I'm just emphasizing placement, right, ranking." 320 00:12:49.770 --> 00:12:51.510 Some charts will do that. 321 00:12:51.510 --> 00:12:54.930 And sometimes the same chart will also show magnitude, 322 00:12:54.930 --> 00:12:56.520 the actual scores. 323 00:12:56.520 --> 00:12:59.310 Some charts will just do rank and that's okay. 324 00:12:59.310 --> 00:13:00.420 It's about your KWYs. 325 00:13:00.420 --> 00:13:01.380 What are you trying to accomplish? 326 00:13:01.380 --> 00:13:04.239 Just show rank or also show the magnitude, 327 00:13:04.239 --> 00:13:05.640 or one or the other, whatever. 328 00:13:05.640 --> 00:13:09.390 Long story short, ranking and magnitude, 329 00:13:09.390 --> 00:13:10.920 even though they are different tasks, 330 00:13:10.920 --> 00:13:13.950 and you can use different charts to accomplish them, 331 00:13:13.950 --> 00:13:17.235 the task that you want your audience to perform 332 00:13:17.235 --> 00:13:20.100 when you're showing them rank or magnitude, 333 00:13:20.100 --> 00:13:22.830 is often comparison. 334 00:13:22.830 --> 00:13:25.410 So, "The Financial Times" does not use comparison 335 00:13:25.410 --> 00:13:28.380 as the word, as a descriptor of any of these columns. 336 00:13:28.380 --> 00:13:30.990 But just a sort of a little hint, 337 00:13:30.990 --> 00:13:33.720 when you're thinking about comparison as the task, 338 00:13:33.720 --> 00:13:35.760 ranking and magnitude are probably the two columns 339 00:13:35.760 --> 00:13:36.990 you wanna look at here. 340 00:13:36.990 --> 00:13:39.420 Now there's another cheat sheet out there 341 00:13:39.420 --> 00:13:41.520 called datavizproject.com that I like. 342 00:13:41.520 --> 00:13:44.130 And this one's a website so I'm just gonna click into it. 343 00:13:44.130 --> 00:13:48.690 And this one also, you can choose charts by function, 344 00:13:48.690 --> 00:13:49.560 like I was just talking about. 345 00:13:49.560 --> 00:13:52.260 Correlation, distribution, et cetera. 346 00:13:52.260 --> 00:13:54.930 Here they have comparison as the function. 347 00:13:54.930 --> 00:13:58.620 So this one has ranking and magnitude charts in there. 348 00:13:58.620 --> 00:14:00.690 Now what I like about this website as opposed 349 00:14:00.690 --> 00:14:05.160 to the Visual Vocabulary is this is a really big collection, 350 00:14:05.160 --> 00:14:08.310 a much longer list of visualizations. 351 00:14:08.310 --> 00:14:09.660 So that's super cool. 352 00:14:09.660 --> 00:14:13.080 Now there's a downside to it 'cause it includes 353 00:14:13.080 --> 00:14:17.430 visualizations like the convex tree map, 354 00:14:17.430 --> 00:14:19.110 which is like, insane. 355 00:14:19.110 --> 00:14:22.830 No human can accurately compare this triangle 356 00:14:22.830 --> 00:14:25.350 to this trapezoidal, whatever. 357 00:14:25.350 --> 00:14:26.490 Like, no. 358 00:14:26.490 --> 00:14:29.190 This is not a chart I would recommend you use, okay? 359 00:14:29.190 --> 00:14:30.900 How about this one? 360 00:14:30.900 --> 00:14:33.300 The Taylor diagram. 361 00:14:33.300 --> 00:14:34.800 Does anyone know what that's for? 362 00:14:34.800 --> 00:14:36.870 I think it has a very specific purpose. 363 00:14:36.870 --> 00:14:38.160 Even the Ternary plot, 364 00:14:38.160 --> 00:14:40.320 which is actually a pretty neat chart type. 365 00:14:40.320 --> 00:14:41.820 It shows the correlation between 366 00:14:41.820 --> 00:14:43.593 three variables essentially. 367 00:14:44.430 --> 00:14:47.250 It's pretty cool, but it's kind of hard 368 00:14:47.250 --> 00:14:49.080 to wrap your head around, especially for normal humans. 369 00:14:49.080 --> 00:14:51.180 So, you know, this is a great list. 370 00:14:51.180 --> 00:14:52.980 I do look at both of these websites, 371 00:14:52.980 --> 00:14:56.550 both of these tools, frequently as a reminder 372 00:14:56.550 --> 00:14:58.800 of the options out there for me. 373 00:14:58.800 --> 00:15:01.680 By the way, the bump chart, this is a ranking chart. 374 00:15:01.680 --> 00:15:03.780 This shows the rank position of these different things 375 00:15:03.780 --> 00:15:05.400 without actually showing the score. 376 00:15:05.400 --> 00:15:07.380 So, just 'cause I mentioned that earlier. 377 00:15:07.380 --> 00:15:10.380 Long story short, use both of these. 378 00:15:10.380 --> 00:15:13.350 Be inspired, be reminded of your options, 379 00:15:13.350 --> 00:15:15.330 and especially as you're learning, you know, 380 00:15:15.330 --> 00:15:16.620 learn what your options are, right? 381 00:15:16.620 --> 00:15:19.383 These are very, very useful tools to do that. 382 00:15:20.538 --> 00:15:22.050 Okay, now I wanna say a couple more things here 383 00:15:22.050 --> 00:15:24.420 before we move on to the next lesson, the next video. 384 00:15:24.420 --> 00:15:25.863 So, one thing is this. 385 00:15:26.910 --> 00:15:28.800 We love our round shapes. 386 00:15:28.800 --> 00:15:32.010 Humans love a round shape, right? 387 00:15:32.010 --> 00:15:35.700 We don't really like straight edges and harsh corners, okay? 388 00:15:35.700 --> 00:15:38.880 But, and we love pie too, at least I do. 389 00:15:38.880 --> 00:15:41.400 But there are problems with round shapes. 390 00:15:41.400 --> 00:15:42.783 One of them being this. 391 00:15:43.620 --> 00:15:45.997 If I were to ask you to tell me, 392 00:15:45.997 --> 00:15:49.080 "Which is bigger? A or B?" 393 00:15:49.080 --> 00:15:51.420 And I do this all the time when I do live sessions 394 00:15:51.420 --> 00:15:53.253 with with workshop clients. 395 00:15:54.090 --> 00:15:56.550 People usually say, "Oh I think it's B," 396 00:15:56.550 --> 00:15:58.530 or, "No, maybe it's A, or maybe they're the same." 397 00:15:58.530 --> 00:16:00.930 Like essentially there's a good debate, okay? 398 00:16:00.930 --> 00:16:03.210 Which is already a problem because there is 399 00:16:03.210 --> 00:16:06.750 a real difference here and it is pretty noticeable. 400 00:16:06.750 --> 00:16:08.310 B is bigger. 401 00:16:08.310 --> 00:16:13.170 But then I ask, "How much bigger?" and universally, 402 00:16:13.170 --> 00:16:15.217 almost without exception, people will say, 403 00:16:15.217 --> 00:16:17.070 "Oh I don't know, 5%? 404 00:16:17.070 --> 00:16:19.680 Certainly not more than 5%, no way." 405 00:16:19.680 --> 00:16:21.630 Then I show them this. 406 00:16:21.630 --> 00:16:25.560 And clearly B is bigger, not even close. 407 00:16:25.560 --> 00:16:27.810 And it is much, much more than 5%. 408 00:16:27.810 --> 00:16:32.640 It's more than 10%, easily more than 10, maybe close to 15%. 409 00:16:32.640 --> 00:16:36.600 So, this is a problem, all right? 410 00:16:36.600 --> 00:16:37.680 Now at the same time, 411 00:16:37.680 --> 00:16:39.360 if I asked you to tell me what percentage 412 00:16:39.360 --> 00:16:42.240 of the hole is this bar over here, 413 00:16:42.240 --> 00:16:43.770 you wouldn't be able to answer it. 414 00:16:43.770 --> 00:16:46.860 So the bar chart fails at the composition display, 415 00:16:46.860 --> 00:16:50.250 the pie chart fails at distinctive value comparison. 416 00:16:50.250 --> 00:16:52.380 Proves the point that I've been talking about. 417 00:16:52.380 --> 00:16:55.710 But here's the thing, why are we so bad at this? 418 00:16:55.710 --> 00:16:58.560 Because human beings are very bad 419 00:16:58.560 --> 00:17:02.100 at pre-attentively measuring the areas of circles, 420 00:17:02.100 --> 00:17:04.410 comparing the areas of circles, 421 00:17:04.410 --> 00:17:07.350 and of partial circles, wedges. 422 00:17:07.350 --> 00:17:09.330 And by the way, we're bad at doing it pre-attentively. 423 00:17:09.330 --> 00:17:11.880 We're also even bad at doing it attentively. 424 00:17:11.880 --> 00:17:13.350 You can stare at this thing for 10 minutes 425 00:17:13.350 --> 00:17:15.450 and you're not gonna succeed, okay? 426 00:17:15.450 --> 00:17:16.957 On top of that, you may say, 427 00:17:16.957 --> 00:17:19.110 "All right, well maybe I'll compare the angle, right? 428 00:17:19.110 --> 00:17:21.060 The angle inside and here." 429 00:17:21.060 --> 00:17:22.957 Nope, you suck at that too. 430 00:17:22.957 --> 00:17:24.960 "Maybe I'll compare the arc length." 431 00:17:24.960 --> 00:17:26.790 Nope, failure, okay? 432 00:17:26.790 --> 00:17:30.030 So, no matter what we do, we're not gonna do a good job 433 00:17:30.030 --> 00:17:33.960 at precise comparison with a pie chart. 434 00:17:33.960 --> 00:17:36.630 Now, at the same time, it's much bigger. 435 00:17:36.630 --> 00:17:38.070 It's about two thirds. 436 00:17:38.070 --> 00:17:39.900 Yeah, pie charts are fine at that. 437 00:17:39.900 --> 00:17:44.070 So as long as what you're doing is a vague comparison 438 00:17:44.070 --> 00:17:46.680 and precision doesn't matter at all, go for it. 439 00:17:46.680 --> 00:17:47.513 Use a pie chart. 440 00:17:47.513 --> 00:17:51.150 But again, only if the proportional share is the key point. 441 00:17:51.150 --> 00:17:52.477 Okay? 442 00:17:52.477 --> 00:17:53.670 "They're about the same. 443 00:17:53.670 --> 00:17:55.560 Maybe I think one's a little bit bigger." 444 00:17:55.560 --> 00:17:57.030 Fine at that. 445 00:17:57.030 --> 00:18:00.090 Now, as part of the proof, sort of an argument 446 00:18:00.090 --> 00:18:03.720 for the failure of our ability to compare circles, 447 00:18:03.720 --> 00:18:05.970 this circle back here, this half circle, 448 00:18:05.970 --> 00:18:07.923 is supposed to represent $1.5 billion. 449 00:18:08.850 --> 00:18:13.050 This one is 2.1 billion, 40% bigger. 450 00:18:13.050 --> 00:18:15.840 It's only a tiny bit wider, right? 451 00:18:15.840 --> 00:18:17.490 Tiny bit bigger diameter, 452 00:18:17.490 --> 00:18:19.710 but it's supposed to be 40% bigger. 453 00:18:19.710 --> 00:18:21.990 Which brings up an important point, by the way. 454 00:18:21.990 --> 00:18:26.990 So, we are always comparing the areas of these objects 455 00:18:27.810 --> 00:18:29.370 and you remember pi r squared, 456 00:18:29.370 --> 00:18:31.560 how you calculate the area of a circle? 457 00:18:31.560 --> 00:18:35.670 That r, that radius or the diameter, expands a tiny bit 458 00:18:35.670 --> 00:18:36.900 and because of pi r squared, 459 00:18:36.900 --> 00:18:40.380 it massively increases the area of the object. 460 00:18:40.380 --> 00:18:41.520 So, just keep that in mind. 461 00:18:41.520 --> 00:18:42.353 All right. 462 00:18:42.353 --> 00:18:45.090 Another problem with round shapes is something called 463 00:18:45.090 --> 00:18:46.923 the Jastrow illusion. 464 00:18:47.790 --> 00:18:52.560 You have two wooden train track pieces here. 465 00:18:52.560 --> 00:18:53.823 They are identical. 466 00:18:54.840 --> 00:18:56.850 They don't look it, but they are. 467 00:18:56.850 --> 00:18:58.860 You could pick one of them up, put it on the other one 468 00:18:58.860 --> 00:19:01.830 and you would find that they fit perfectly. 469 00:19:01.830 --> 00:19:04.740 This is a photograph, this is not like manipulated, okay? 470 00:19:04.740 --> 00:19:08.220 So when you have like a double donut chart, 471 00:19:08.220 --> 00:19:12.360 the outer arc will look much smaller than the inner arc. 472 00:19:12.360 --> 00:19:13.620 That's a problem. 473 00:19:13.620 --> 00:19:14.970 If you're comparing these two things, 474 00:19:14.970 --> 00:19:17.460 you're gonna say the bottom one is bigger than the top one. 475 00:19:17.460 --> 00:19:20.250 So, you gotta be cautious about that, okay? 476 00:19:20.250 --> 00:19:25.110 So, optical illusions abound with circular shapes. 477 00:19:25.110 --> 00:19:28.260 And yet with all of that said, 478 00:19:28.260 --> 00:19:30.420 and I'll add one more thing on top of it, 479 00:19:30.420 --> 00:19:31.500 don't do 3D charts. 480 00:19:31.500 --> 00:19:35.643 3D charts are abysmal, they fail, they're terrible always. 481 00:19:36.900 --> 00:19:38.430 And yet this works. 482 00:19:38.430 --> 00:19:39.930 So, it's a 3D pie chart, 483 00:19:39.930 --> 00:19:42.990 like the worst offense in all of data visualization. 484 00:19:42.990 --> 00:19:46.260 But I like it and I think it works. 485 00:19:46.260 --> 00:19:49.290 So this is an example where all of the quote unquote rules, 486 00:19:49.290 --> 00:19:51.720 which really aren't rules, they're just sort of guidance. 487 00:19:51.720 --> 00:19:56.250 Some of which based, you know, on serious research for sure. 488 00:19:56.250 --> 00:19:58.290 But the fact of the matter is, 489 00:19:58.290 --> 00:20:00.150 sometimes things work even though they shouldn't. 490 00:20:00.150 --> 00:20:03.240 Okay? This is where you can do what you want to do, 491 00:20:03.240 --> 00:20:05.940 try it out, and test it. 492 00:20:05.940 --> 00:20:07.507 If you showed this to somebody and said, 493 00:20:07.507 --> 00:20:08.730 "Well what does this tell you? 494 00:20:08.730 --> 00:20:09.960 What did you learn from this?" 495 00:20:09.960 --> 00:20:12.330 And you want to ask open-ended questions like that. 496 00:20:12.330 --> 00:20:14.220 And if they said, "Yeah, yeah, okay. 497 00:20:14.220 --> 00:20:15.930 Of all the water on Earth, 498 00:20:15.930 --> 00:20:18.090 the vast majority of it is saltwater 499 00:20:18.090 --> 00:20:20.340 and only a tiny bit is freshwater. 500 00:20:20.340 --> 00:20:22.530 The vast majority of which is frozen 501 00:20:22.530 --> 00:20:26.280 and or deep in the ground, and only a tiny little sliver 502 00:20:26.280 --> 00:20:29.010 of that is actually accessible to us as drinking water 503 00:20:29.010 --> 00:20:31.620 in lakes or in the soil or whatever," 504 00:20:31.620 --> 00:20:34.860 then you know what, this succeeded. 505 00:20:34.860 --> 00:20:38.370 And I can read this chart and understand what it means. 506 00:20:38.370 --> 00:20:40.920 So, take it all with a grain of salt. 507 00:20:40.920 --> 00:20:44.040 Most importantly, try different visuals. 508 00:20:44.040 --> 00:20:45.030 Try a bunch of different things. 509 00:20:45.030 --> 00:20:47.130 You have a data set, especially if you're in software 510 00:20:47.130 --> 00:20:49.590 where you have a do have a bunch of buttons to click, 511 00:20:49.590 --> 00:20:51.840 click all the buttons and see what you see, 512 00:20:51.840 --> 00:20:54.660 and then test it, okay? 513 00:20:54.660 --> 00:20:56.430 Test it and see if it works. 514 00:20:56.430 --> 00:20:59.160 Find inspiration and tools like the Visual Vocabulary 515 00:20:59.160 --> 00:21:01.590 and datavizproject.com to help you identify 516 00:21:01.590 --> 00:21:03.450 charts to try. 517 00:21:03.450 --> 00:21:07.710 And also don't be limited by the charts in your tools. 518 00:21:07.710 --> 00:21:09.210 So, we're gonna talk more about that 519 00:21:09.210 --> 00:21:11.260 as we go through the rest of this module.