WEBVTT 1 00:00:00.210 --> 00:00:02.070 All right, last but not least for this week, 2 00:00:02.070 --> 00:00:04.920 I want to look at some examples of data visualization 3 00:00:04.920 --> 00:00:07.320 and just think about them all in the context 4 00:00:07.320 --> 00:00:08.640 of what we've been talking about all week, 5 00:00:08.640 --> 00:00:12.330 chart selection, what to put in your charts, et cetera, 6 00:00:12.330 --> 00:00:14.070 starting off with this. 7 00:00:14.070 --> 00:00:15.870 We looked at this a few modules back 8 00:00:15.870 --> 00:00:17.820 when we were talking about maps, 9 00:00:17.820 --> 00:00:22.500 the idea that we can see the size of this iceberg 10 00:00:22.500 --> 00:00:25.830 that broke away from Antarctica 11 00:00:25.830 --> 00:00:29.430 in relation to other geographic things that we recognize, 12 00:00:29.430 --> 00:00:31.650 Puerto Rico, Jamaica, et cetera. 13 00:00:31.650 --> 00:00:32.700 This is cool. 14 00:00:32.700 --> 00:00:35.220 But one thing I didn't mention, this is the right time 15 00:00:35.220 --> 00:00:40.050 to talk about it, is that while I understand Puerto Rico 16 00:00:40.050 --> 00:00:42.630 and how big it is compared to Florida, 17 00:00:42.630 --> 00:00:46.140 the United States, whatever, sort of, 18 00:00:46.140 --> 00:00:48.840 it's a little bit hard to make precise comparisons 19 00:00:48.840 --> 00:00:51.933 when you're talking about weird shapes, right? 20 00:00:53.220 --> 00:00:55.860 What's the percentage difference between this iceberg 21 00:00:55.860 --> 00:00:57.270 and Bali, for instance? 22 00:00:57.270 --> 00:00:58.950 Good luck, you can't do it. 23 00:00:58.950 --> 00:01:01.920 That's why this other approach, which we also looked at, 24 00:01:01.920 --> 00:01:06.920 is a better visualization for precise comparison. 25 00:01:07.590 --> 00:01:10.050 We're really good at comparing the areas of rectangles. 26 00:01:10.050 --> 00:01:11.340 Remember, we're bad at it with circles, 27 00:01:11.340 --> 00:01:13.440 but we're pretty good at it with rectangles. 28 00:01:13.440 --> 00:01:14.400 So I'm gonna do pretty good. 29 00:01:14.400 --> 00:01:18.330 Clearly the closest comparison looks like it's Cape Verde 30 00:01:18.330 --> 00:01:21.660 or maybe, I can't see it, I can't read the text, 31 00:01:21.660 --> 00:01:24.480 it's too small, but one of the other ones we can see here. 32 00:01:24.480 --> 00:01:27.120 Long story short, it's much easier to make 33 00:01:27.120 --> 00:01:29.340 that precise comparison with this visual, 34 00:01:29.340 --> 00:01:32.700 but it's nice to have that geographic comparison 35 00:01:32.700 --> 00:01:35.400 to relate it to things that I kind of know about. 36 00:01:35.400 --> 00:01:38.220 So they're both good. 37 00:01:38.220 --> 00:01:40.830 That's why using both of them in that data story 38 00:01:40.830 --> 00:01:43.110 was effective, it was a smart idea. 39 00:01:43.110 --> 00:01:44.790 Another example of a data story 40 00:01:44.790 --> 00:01:46.140 that has great data visualization. 41 00:01:46.140 --> 00:01:46.973 We're gonna back into 42 00:01:46.973 --> 00:01:48.750 the "Atlas of Sustainable Development Goals". 43 00:01:48.750 --> 00:01:50.640 We looked at this a while back 44 00:01:50.640 --> 00:01:52.290 when we were talking about story telling. 45 00:01:52.290 --> 00:01:53.820 Here we're looking at the global health 46 00:01:53.820 --> 00:01:57.270 amid a pandemic story, and again, good structure, 47 00:01:57.270 --> 00:01:59.220 all this stuff we talked about last time, 48 00:01:59.220 --> 00:02:01.950 but specifically looking at the visualizations. 49 00:02:01.950 --> 00:02:05.220 This is about the pandemic, and the first part 50 00:02:05.220 --> 00:02:07.440 about the pandemic is how local health risks 51 00:02:07.440 --> 00:02:09.540 can become global health risks. 52 00:02:09.540 --> 00:02:10.530 This is an example. 53 00:02:10.530 --> 00:02:15.030 I mentioned this briefly during module four about mapping, 54 00:02:15.030 --> 00:02:15.990 but you shouldn't make a map 55 00:02:15.990 --> 00:02:18.000 just because you have geographic information. 56 00:02:18.000 --> 00:02:19.890 Just because we have geographic information 57 00:02:19.890 --> 00:02:22.200 about where zoonosis events occur, 58 00:02:22.200 --> 00:02:24.570 when disease jumps from animals to humans, 59 00:02:24.570 --> 00:02:26.700 doesn't mean we should put it on a map. 60 00:02:26.700 --> 00:02:29.670 But this data story right now is about the idea 61 00:02:29.670 --> 00:02:32.010 that yes, it's localized. 62 00:02:32.010 --> 00:02:36.104 Zoonosis happens more here than it happens over here. 63 00:02:36.104 --> 00:02:39.720 Therefore, if the point of the data story might be 64 00:02:39.720 --> 00:02:41.970 maybe we should put more monitoring stations here 65 00:02:41.970 --> 00:02:44.430 and more people here to sort of keep track of this stuff, 66 00:02:44.430 --> 00:02:46.830 and we don't need as many of them over here. 67 00:02:46.830 --> 00:02:49.020 And the idea that if zoonosis happens here, 68 00:02:49.020 --> 00:02:52.050 we can expect the spread to occur in areas 69 00:02:52.050 --> 00:02:54.930 that have connections with this area, et cetera, et cetera. 70 00:02:54.930 --> 00:02:55.980 But of course it's all global 71 00:02:55.980 --> 00:02:57.150 because people get on airplanes, 72 00:02:57.150 --> 00:02:59.280 we now know very, very, very well, right? 73 00:02:59.280 --> 00:03:03.390 Anyways, long story short, the map is a good visualization 74 00:03:03.390 --> 00:03:05.940 for this part of this story. 75 00:03:05.940 --> 00:03:08.850 As the story continues, we're looking at access 76 00:03:08.850 --> 00:03:11.790 to health care as part of this global pandemic story, 77 00:03:11.790 --> 00:03:15.630 and looking at it segmented by income level. 78 00:03:15.630 --> 00:03:19.470 Low income countries have a very low median 79 00:03:19.470 --> 00:03:22.980 nurses and physicians per 1,000 people rate. 80 00:03:22.980 --> 00:03:24.840 Ah, another rate, right? 81 00:03:24.840 --> 00:03:28.050 And this is a nice little swarm plot. 82 00:03:28.050 --> 00:03:30.540 So not only do I get the median, but remember we talked 83 00:03:30.540 --> 00:03:33.391 about show people the data if you have it. 84 00:03:33.391 --> 00:03:36.840 Low income countries have a very low median, 85 00:03:36.840 --> 00:03:38.760 fairly clustered at the low end, 86 00:03:38.760 --> 00:03:41.340 but one interesting outlier up here. 87 00:03:41.340 --> 00:03:44.610 High income countries have a very large spread 88 00:03:44.610 --> 00:03:45.900 of access to health care. 89 00:03:45.900 --> 00:03:49.680 It's not very monolithic, much higher median value 90 00:03:49.680 --> 00:03:52.140 than low income countries, but the poorest, 91 00:03:52.140 --> 00:03:55.470 or of the high income countries, 92 00:03:55.470 --> 00:03:57.840 the one with the lowest access to health care, 93 00:03:57.840 --> 00:04:01.140 it's as low as the number two in a low income country, 94 00:04:01.140 --> 00:04:02.730 or maybe a little bit higher. 95 00:04:02.730 --> 00:04:05.220 Long story short, a lot of nuance to be found here, 96 00:04:05.220 --> 00:04:07.320 and I have the ability to find individual countries 97 00:04:07.320 --> 00:04:09.150 if I want to look into them. 98 00:04:09.150 --> 00:04:12.210 But think about this visualization. 99 00:04:12.210 --> 00:04:14.427 It's a distribution diagram, yes. 100 00:04:14.427 --> 00:04:16.410 I have access to all the data points, yes. 101 00:04:16.410 --> 00:04:19.140 I can see the clustering, I can see the outliers, 102 00:04:19.140 --> 00:04:20.490 all that stuff. 103 00:04:20.490 --> 00:04:23.490 Good chart type to segment, to allow me to see 104 00:04:23.490 --> 00:04:24.540 the difference between low income 105 00:04:24.540 --> 00:04:26.010 and high income, et cetera. 106 00:04:26.010 --> 00:04:29.520 Now, later on in the data story, right after that, 107 00:04:29.520 --> 00:04:30.660 they're doing something similar. 108 00:04:30.660 --> 00:04:31.560 They're showing me different data. 109 00:04:31.560 --> 00:04:32.430 This is about the coverage 110 00:04:32.430 --> 00:04:35.493 of essential health service by income. 111 00:04:37.230 --> 00:04:38.460 Once again, though, they're segmenting 112 00:04:38.460 --> 00:04:41.310 by low income, lower, middle, et cetera. 113 00:04:41.310 --> 00:04:43.680 So of course they're gonna show me the same chart type. 114 00:04:43.680 --> 00:04:45.930 This brings up an important point. 115 00:04:45.930 --> 00:04:48.390 You should absolutely keep things fresh 116 00:04:48.390 --> 00:04:50.130 and interesting for your audiences. 117 00:04:50.130 --> 00:04:54.570 If you have a slide deck with 800 bar charts, 118 00:04:54.570 --> 00:04:56.760 I'm gonna fall asleep, yes. 119 00:04:56.760 --> 00:05:01.760 But if you have 800 slides where the task 120 00:05:02.130 --> 00:05:05.730 I'm supposed to perform is exactly the same, 121 00:05:05.730 --> 00:05:08.280 and it's all one-dimensional data, 122 00:05:08.280 --> 00:05:12.210 therefore, and if you've decided on slide one 123 00:05:12.210 --> 00:05:16.140 the bar chart is the right chart for that task, 124 00:05:16.140 --> 00:05:18.510 you better repeat those charts 800 times. 125 00:05:18.510 --> 00:05:20.190 So yeah, I might fall asleep, 126 00:05:20.190 --> 00:05:22.200 but maybe that's because it's a boring data story, 127 00:05:22.200 --> 00:05:24.270 not because you're repeating the same chart. 128 00:05:24.270 --> 00:05:26.070 You must repeat the same chart 129 00:05:26.070 --> 00:05:28.200 if you're asking me to do the same thing. 130 00:05:28.200 --> 00:05:29.970 Otherwise I'm gonna be confused. 131 00:05:29.970 --> 00:05:33.120 If I saw lower income, middle income, et cetera countries 132 00:05:33.120 --> 00:05:35.490 and you were showing the exact same type of thing, 133 00:05:35.490 --> 00:05:38.790 but this one was a box and whiskers plot 134 00:05:38.790 --> 00:05:40.530 and the other one was a swarm plot, 135 00:05:40.530 --> 00:05:42.120 I might be like, why are you swapping it out? 136 00:05:42.120 --> 00:05:42.953 Why are you switching it up? 137 00:05:42.953 --> 00:05:45.870 Or if this was bars or something, especially, 138 00:05:45.870 --> 00:05:46.703 what are you doing to me? 139 00:05:46.703 --> 00:05:48.720 You're confusing me, right? 140 00:05:48.720 --> 00:05:50.490 Make the chart match the task. 141 00:05:50.490 --> 00:05:53.700 Once you've assigned a chart to a task, repeat it 142 00:05:53.700 --> 00:05:56.790 so I'm not asking essentially, I'm gonna be distracted 143 00:05:56.790 --> 00:05:59.490 and thinking about, why are you switching charts on me? 144 00:05:59.490 --> 00:06:01.980 So repetition as appropriate. 145 00:06:01.980 --> 00:06:04.230 Later on in this chart, they have a connected scatterplot, 146 00:06:04.230 --> 00:06:05.520 one of those master class charts 147 00:06:05.520 --> 00:06:06.390 that we were just looking at. 148 00:06:06.390 --> 00:06:07.650 Again, a very effective chart 149 00:06:07.650 --> 00:06:09.960 when you have time to explain it to your audience. 150 00:06:09.960 --> 00:06:12.480 And last but not least, actually two more. 151 00:06:12.480 --> 00:06:14.250 I love a slope graph. 152 00:06:14.250 --> 00:06:18.150 Here we have 20 years of data, 1990 to 2019. 153 00:06:18.150 --> 00:06:20.010 It's not a line chart going up and down and up and down 154 00:06:20.010 --> 00:06:22.020 and up and down showing me all the volatility. 155 00:06:22.020 --> 00:06:23.700 The point of this part of the data story 156 00:06:23.700 --> 00:06:26.820 is simply we started here, we ended here. 157 00:06:26.820 --> 00:06:29.400 So a single line with two data points, 158 00:06:29.400 --> 00:06:32.190 and the slope of the line showing that change, 159 00:06:32.190 --> 00:06:33.630 is a very effective visualization 160 00:06:33.630 --> 00:06:36.300 when you don't want to show, emphasize, 161 00:06:36.300 --> 00:06:38.850 talk about all the ups and downs in between. 162 00:06:38.850 --> 00:06:40.650 Basic and wise, right? 163 00:06:40.650 --> 00:06:42.300 And by the way, we can see very clearly, 164 00:06:42.300 --> 00:06:43.320 based on the slope of the line, 165 00:06:43.320 --> 00:06:45.660 we're very good at reading slope of the line. 166 00:06:45.660 --> 00:06:49.470 Niger has a very different value than Zimbabwe, 167 00:06:49.470 --> 00:06:51.690 and by the way, sub-Saharan Africa 168 00:06:51.690 --> 00:06:55.110 looks very different than Europe and Central Asia. 169 00:06:55.110 --> 00:06:57.480 So slope graphs are great visualizations. 170 00:06:57.480 --> 00:07:01.110 And now, last but not least, this. 171 00:07:01.110 --> 00:07:03.300 Look at this. 172 00:07:03.300 --> 00:07:06.540 We have thousands of data points here 173 00:07:06.540 --> 00:07:10.020 very compressed in a tight space using matrix heat maps. 174 00:07:10.020 --> 00:07:13.290 We talked about this a couple, a few videos ago. 175 00:07:13.290 --> 00:07:15.480 These are really powerful. 176 00:07:15.480 --> 00:07:17.760 I have every single individual data point. 177 00:07:17.760 --> 00:07:21.930 If I want to see Malaysia in 2012 or whatever this is, 178 00:07:21.930 --> 00:07:24.930 95%, boom, there it is. 179 00:07:24.930 --> 00:07:29.430 At the same time, big picture story not lost. 180 00:07:29.430 --> 00:07:31.440 This is looking at immunization rates, 181 00:07:31.440 --> 00:07:33.600 again, segmented by income level. 182 00:07:33.600 --> 00:07:36.120 Low income countries have much lower immunization rates 183 00:07:36.120 --> 00:07:37.020 than high income countries. 184 00:07:37.020 --> 00:07:38.490 I can see that by the color. 185 00:07:38.490 --> 00:07:41.310 And over time though, like in the high income countries, 186 00:07:41.310 --> 00:07:43.800 it started out even lower, and it got better. 187 00:07:43.800 --> 00:07:45.090 In some outlier countries, 188 00:07:45.090 --> 00:07:47.400 they have pretty darn good immunization rates. 189 00:07:47.400 --> 00:07:50.250 High incomes countries were lower back in the day, 190 00:07:50.250 --> 00:07:54.120 much higher now, but consistently over time 191 00:07:54.120 --> 00:07:55.140 doing pretty well. 192 00:07:55.140 --> 00:07:56.730 There's clearly a pattern. 193 00:07:56.730 --> 00:07:59.070 The story is there. 194 00:07:59.070 --> 00:08:01.440 And yet I have access to individual data points. 195 00:08:01.440 --> 00:08:03.523 By the way, a little secret trick. 196 00:08:03.523 --> 00:08:07.140 If you're creating these, which you can do right now, 197 00:08:07.140 --> 00:08:10.890 go in Excel, go to your table, use conditional formatting, 198 00:08:10.890 --> 00:08:11.910 use coloring. 199 00:08:11.910 --> 00:08:14.850 Boom, now you have a highlight table, a matrix heat map. 200 00:08:14.850 --> 00:08:17.691 Use more creative colors than red and green. 201 00:08:17.691 --> 00:08:19.890 We talked about that already. 202 00:08:19.890 --> 00:08:22.500 And on top of that, nice little trick. 203 00:08:22.500 --> 00:08:23.940 First of all, hide the values, 204 00:08:23.940 --> 00:08:25.320 because those are visually distracting. 205 00:08:25.320 --> 00:08:26.730 That's an easy thing to do in Excel. 206 00:08:26.730 --> 00:08:28.080 You can Google it. 207 00:08:28.080 --> 00:08:32.790 But also change the cell and row, 208 00:08:32.790 --> 00:08:35.130 the column and row separators. 209 00:08:35.130 --> 00:08:39.930 Instead of a gray or a black line, turn it white. 210 00:08:39.930 --> 00:08:41.400 That's what makes this look so nice, 211 00:08:41.400 --> 00:08:44.250 because it looks like space in between the individual cells, 212 00:08:44.250 --> 00:08:48.540 these white separators, versus putting the data into jail. 213 00:08:48.540 --> 00:08:49.800 It's a simple little trick. 214 00:08:49.800 --> 00:08:51.750 That's assuming your slide background is white. 215 00:08:51.750 --> 00:08:53.580 Whatever your slide background color is, 216 00:08:53.580 --> 00:08:55.350 make the separators that color, 217 00:08:55.350 --> 00:08:57.450 and then it makes it look like they're spaced apart 218 00:08:57.450 --> 00:09:00.870 rather than segmented in a more harsh way. 219 00:09:00.870 --> 00:09:02.880 Nice little design trick. 220 00:09:02.880 --> 00:09:03.713 All right, another one. 221 00:09:03.713 --> 00:09:04.710 I love this one. 222 00:09:04.710 --> 00:09:05.610 From Reuters again. 223 00:09:05.610 --> 00:09:07.350 I love Reuters, they do great work. 224 00:09:07.350 --> 00:09:09.150 Stopping the spread, reaching herd immunity 225 00:09:09.150 --> 00:09:10.170 through vaccination. 226 00:09:10.170 --> 00:09:12.780 This is a whole data story all about herd immunity 227 00:09:12.780 --> 00:09:14.040 and vaccination and blah, blah, blah. 228 00:09:14.040 --> 00:09:16.500 But these simulations are super cool. 229 00:09:16.500 --> 00:09:19.890 This is meant to explain how herd immunity works, 230 00:09:19.890 --> 00:09:21.480 what it is. 231 00:09:21.480 --> 00:09:24.300 And so each square is a person. 232 00:09:24.300 --> 00:09:26.250 Orange means they're infected, 233 00:09:26.250 --> 00:09:29.040 not orange means not infected. 234 00:09:29.040 --> 00:09:31.650 Okay, and so what we can see is that 235 00:09:31.650 --> 00:09:35.070 when no one is vaccinated, almost everybody gets infected. 236 00:09:35.070 --> 00:09:38.640 20%, 40%, still the vast majority get infected. 237 00:09:38.640 --> 00:09:42.780 But once you hit 60%, assuming efficacy rates 238 00:09:42.780 --> 00:09:44.730 of the vaccine and infection rates. 239 00:09:44.730 --> 00:09:47.790 Obviously this didn't work out quite this way. 240 00:09:47.790 --> 00:09:49.770 But once you hit whatever the magic number is 241 00:09:49.770 --> 00:09:53.100 based on those things, this is what happens. 242 00:09:53.100 --> 00:09:55.560 This shows you via live simulation, 243 00:09:55.560 --> 00:09:56.580 you can see every time this runs, 244 00:09:56.580 --> 00:09:58.680 the pattern looks visually different. 245 00:09:58.680 --> 00:10:01.050 This shows you how herd immunity works. 246 00:10:01.050 --> 00:10:03.780 You should run out of people to infect very quickly 247 00:10:03.780 --> 00:10:07.110 when you have a certain vaccination rate. 248 00:10:07.110 --> 00:10:09.930 Sometimes more, sometimes less. 249 00:10:09.930 --> 00:10:12.420 All right, great data story, really nicely visualized 250 00:10:12.420 --> 00:10:16.020 using not a typical chart, but using a very effective 251 00:10:16.020 --> 00:10:17.610 visual mechanism. 252 00:10:17.610 --> 00:10:19.890 Last but not least, one of my favorite 253 00:10:19.890 --> 00:10:22.410 data visualizations of all time. 254 00:10:22.410 --> 00:10:27.180 This is showing the murder rate in Honduras. 255 00:10:27.180 --> 00:10:29.250 You're supposed to take the magazine out, 256 00:10:29.250 --> 00:10:31.680 tack it up on a wall and throw a dart at it, 257 00:10:31.680 --> 00:10:33.690 and your odds of being murdered, 258 00:10:33.690 --> 00:10:37.140 if you were a man in Honduras in 2012 259 00:10:37.140 --> 00:10:40.260 were the same as the odds of hitting this rectangle, 260 00:10:40.260 --> 00:10:44.493 compared to the United States, compared to Singapore. 261 00:10:45.810 --> 00:10:48.090 So I can read the numbers, right? 262 00:10:48.090 --> 00:10:52.590 One in 599, one in 13,000, one in 256,000. 263 00:10:52.590 --> 00:10:55.080 There's clearly much lower odds of happening than this, 264 00:10:55.080 --> 00:11:00.080 I get it, but humans are very bad at probabilities. 265 00:11:00.600 --> 00:11:02.160 So making it a visual experience 266 00:11:02.160 --> 00:11:05.280 and even a tactile experience is very powerful. 267 00:11:05.280 --> 00:11:08.310 I could hit this square by accident very easily. 268 00:11:08.310 --> 00:11:10.830 I could literally never, ever, ever, ever, 269 00:11:10.830 --> 00:11:12.420 ever hit this square. 270 00:11:12.420 --> 00:11:13.253 It aint gonna happen. 271 00:11:13.253 --> 00:11:15.810 You are not gonna be murdered in Singapore. 272 00:11:15.810 --> 00:11:18.750 So extremely effective data story telling. 273 00:11:18.750 --> 00:11:20.520 Two paragraphs of text. 274 00:11:20.520 --> 00:11:25.520 Very simple, but very effective data visualization 275 00:11:26.970 --> 00:11:28.350 with a few squares. 276 00:11:28.350 --> 00:11:31.050 These squares would need to be sized by area, 277 00:11:31.050 --> 00:11:33.630 not width or height, double the area, 278 00:11:33.630 --> 00:11:36.810 not double the height if it's double the number. 279 00:11:36.810 --> 00:11:38.880 But I think we could all execute on this 280 00:11:38.880 --> 00:11:42.150 in PowerPoint in five minutes, right? 281 00:11:42.150 --> 00:11:44.700 So this is an example of don't be hung up on your charts. 282 00:11:44.700 --> 00:11:47.190 You might be tempted to show this as a bar chart. 283 00:11:47.190 --> 00:11:48.600 Maybe there's another way. 284 00:11:48.600 --> 00:11:51.003 You can come up with better ideas.