WEBVTT 1 00:00:00.044 --> 00:00:02.580 Like we talked about in the last video, 2 00:00:02.580 --> 00:00:05.370 picking charts is all about asking questions, right? 3 00:00:05.370 --> 00:00:09.060 The why's, not the ones we keep talking about of course, 4 00:00:09.060 --> 00:00:12.210 also what tasks you want to enable in your audience. 5 00:00:12.210 --> 00:00:14.070 And there are other questions 6 00:00:14.070 --> 00:00:16.590 and considerations you should be taking into account. 7 00:00:16.590 --> 00:00:20.520 So, for instance, we should always be asking ourselves 8 00:00:20.520 --> 00:00:22.980 the question compared to what? 9 00:00:22.980 --> 00:00:26.640 Now we talked about comparison as being a specific task, 10 00:00:26.640 --> 00:00:30.630 which you may or may not be really wanting your audience 11 00:00:30.630 --> 00:00:32.760 to perform that task of comparison. 12 00:00:32.760 --> 00:00:34.410 Literally compare this to this, 13 00:00:34.410 --> 00:00:36.000 and understand how much the difference is. 14 00:00:36.000 --> 00:00:38.070 Okay, that's a specific task. 15 00:00:38.070 --> 00:00:40.320 But even when that's not the task, 16 00:00:40.320 --> 00:00:42.600 maybe you're showing a trend over time, 17 00:00:42.600 --> 00:00:46.500 but you still in sort of a more subtle way, 18 00:00:46.500 --> 00:00:49.680 have to ask yourself well what is this compared to? 19 00:00:49.680 --> 00:00:51.300 Not because that's the task you're enabling 20 00:00:51.300 --> 00:00:52.200 in your audience, 21 00:00:52.200 --> 00:00:54.750 but because we always need that context 22 00:00:54.750 --> 00:00:56.160 to understand the data. 23 00:00:56.160 --> 00:00:57.430 So as an example of that 24 00:00:58.551 --> 00:01:00.700 there's this data story from Reuters, 25 00:01:00.700 --> 00:01:03.150 which was looking at the use of plastic bottles, okay? 26 00:01:03.150 --> 00:01:06.900 You know, individual use, single use plastic bottles. 27 00:01:06.900 --> 00:01:10.890 And this visualization was part of that story. 28 00:01:10.890 --> 00:01:13.033 And the basic idea was, 29 00:01:13.033 --> 00:01:15.476 if I remember correctly this is the annual, 30 00:01:15.476 --> 00:01:20.340 the number of individual single use plastic bottles 31 00:01:20.340 --> 00:01:23.010 produced annually, maybe globally. 32 00:01:23.010 --> 00:01:23.970 I can't remember. 33 00:01:23.970 --> 00:01:26.250 Long story short it was like 1.3 trillion 34 00:01:26.250 --> 00:01:27.840 or some crazy number, 35 00:01:27.840 --> 00:01:29.730 which is like well what does that mean? 36 00:01:29.730 --> 00:01:31.080 Well compared to what? 37 00:01:31.080 --> 00:01:32.430 Well, one way of comparison, 38 00:01:32.430 --> 00:01:34.590 one way of making that number relatable, 39 00:01:34.590 --> 00:01:38.610 is to show us visually what a giant pile of them 40 00:01:38.610 --> 00:01:39.443 would look like. 41 00:01:39.443 --> 00:01:41.250 And yes, it would be a mountain 42 00:01:41.250 --> 00:01:44.550 that would dwarf Manhattan okay? 43 00:01:44.550 --> 00:01:47.610 So compared to what is something we should ask ourselves 44 00:01:47.610 --> 00:01:50.040 simply to allow ourselves to make the numbers relatable 45 00:01:50.040 --> 00:01:52.650 to our audience, even if the task is not comparison. 46 00:01:52.650 --> 00:01:55.894 Okay, another question we should always be asking 47 00:01:55.894 --> 00:01:58.770 is are these numbers good or bad? 48 00:01:58.770 --> 00:02:01.470 Now, we ask ourselves that for a variety of reasons. 49 00:02:01.470 --> 00:02:04.050 One of which is that I'm gonna be communicating them 50 00:02:04.050 --> 00:02:07.380 with somebody and should I be helping them understand 51 00:02:07.380 --> 00:02:08.850 whether they're good or bad? 52 00:02:08.850 --> 00:02:10.932 Well, that's one question. 53 00:02:10.932 --> 00:02:12.180 Sometimes you don't need to emphasize good or bad. 54 00:02:12.180 --> 00:02:14.040 Other times you do want to. 55 00:02:14.040 --> 00:02:17.100 And then also how would I do that? 56 00:02:17.100 --> 00:02:19.020 So this is an example of great visualization 57 00:02:19.020 --> 00:02:23.700 from the Wall Street Journal looking at polio over time. 58 00:02:23.700 --> 00:02:27.660 And so when we use what's called a matrix heat map, 59 00:02:27.660 --> 00:02:29.550 this is a matrix diagram, but it's a heat map 60 00:02:29.550 --> 00:02:32.550 because we're using intensity of color to display the data. 61 00:02:32.550 --> 00:02:34.380 It's a really cool visualization type 62 00:02:34.380 --> 00:02:36.060 that you should definitely add to your repertoire 63 00:02:36.060 --> 00:02:38.107 if you're not always using them because it's, 64 00:02:38.107 --> 00:02:42.690 it has the beautiful effect of allowing us to see 65 00:02:42.690 --> 00:02:45.870 all the data, tons of data points here, right? 66 00:02:45.870 --> 00:02:48.390 In fact, in fact we have essentially 50 states, 67 00:02:48.390 --> 00:02:49.533 40 years of data. 68 00:02:50.371 --> 00:02:51.630 We have 2000 data points here. 69 00:02:51.630 --> 00:02:54.540 And yet I don't lose the big picture story, okay? 70 00:02:54.540 --> 00:02:56.970 Now in terms of the good bad question, 71 00:02:56.970 --> 00:03:00.330 I'm using color here obviously to indicate good and bad. 72 00:03:00.330 --> 00:03:05.070 We're using red for really bad, yellow for pretty bad right? 73 00:03:05.070 --> 00:03:06.030 And then blue, 74 00:03:06.030 --> 00:03:09.210 or actually I think in this case white for good. 75 00:03:09.210 --> 00:03:11.400 In other words it disappears to show goodness, 76 00:03:11.400 --> 00:03:14.490 a lack of polio. 77 00:03:14.490 --> 00:03:16.350 So that's an interesting distinction too by the way. 78 00:03:16.350 --> 00:03:19.410 So sometimes good, you know we've got the typical red green 79 00:03:19.410 --> 00:03:21.158 right thing going. 80 00:03:21.158 --> 00:03:22.830 Because of color blindness, 81 00:03:22.830 --> 00:03:25.470 which I think we'll talk about a little bit later on, 82 00:03:25.470 --> 00:03:27.990 you can avoid red green, instead use red blue. 83 00:03:27.990 --> 00:03:29.460 But we're not actually using blue 84 00:03:29.460 --> 00:03:31.890 as the opposite end of the spectrum for this visual. 85 00:03:31.890 --> 00:03:33.570 We're using white because we're, 86 00:03:33.570 --> 00:03:36.090 we're communicating a lack of. 87 00:03:36.090 --> 00:03:37.290 And so it's sort of a different thing. 88 00:03:37.290 --> 00:03:39.960 More of like fading into the background of the chart, 89 00:03:39.960 --> 00:03:41.580 of the slide rather. 90 00:03:41.580 --> 00:03:45.030 Long story short, we should ask ourselves is it good or bad? 91 00:03:45.030 --> 00:03:47.493 And how are we gonna depict that if we need to depict that? 92 00:03:47.493 --> 00:03:50.217 Another question we should ask is 93 00:03:50.217 --> 00:03:52.896 about big versus small numbers. 94 00:03:52.896 --> 00:03:56.156 And this brings up a couple different points. 95 00:03:56.156 --> 00:03:58.200 I'm gonna show you the actual data story here. 96 00:03:58.200 --> 00:04:01.299 So this is Roger Federer, 20 years, 20 titles, 97 00:04:01.299 --> 00:04:03.985 and it starts off with just a very brief visual 98 00:04:03.985 --> 00:04:06.500 showing a timeline. 99 00:04:06.500 --> 00:04:09.856 And by the way, time is flowing down, 100 00:04:09.856 --> 00:04:11.460 which is a weird choice. 101 00:04:11.460 --> 00:04:13.810 Time usually flows left to right. 102 00:04:13.810 --> 00:04:16.848 I'll try to remember to talk about that at some point. 103 00:04:16.848 --> 00:04:20.700 But what we have here is rank going from left to right. 104 00:04:20.700 --> 00:04:23.315 And you may remember when we talked about 105 00:04:23.315 --> 00:04:25.260 pre-attentive processing and we talked about the idea 106 00:04:25.260 --> 00:04:27.990 that the mountains are down here and the sky is up above, 107 00:04:27.990 --> 00:04:30.090 and that low numbers should be down 108 00:04:30.090 --> 00:04:32.340 and high numbers should be up. 109 00:04:32.340 --> 00:04:34.536 And also that low numbers should be to the left 110 00:04:34.536 --> 00:04:37.316 and high numbers should be to the right usually. 111 00:04:37.316 --> 00:04:40.155 But it is also a best practice, 112 00:04:40.155 --> 00:04:44.317 that bad should be left and down, 113 00:04:44.317 --> 00:04:47.254 and good should be up and to the right. 114 00:04:47.254 --> 00:04:50.220 So we have competing best practices here, 115 00:04:50.220 --> 00:04:55.220 because a high number with rank, he was ranked 500th is bad, 116 00:04:57.540 --> 00:05:01.140 and a low number, number one, is good. 117 00:05:01.140 --> 00:05:04.590 So this was a very intentional decision 118 00:05:04.590 --> 00:05:08.730 to change this scale so it goes high number to low number 119 00:05:08.730 --> 00:05:09.563 left to right. 120 00:05:10.596 --> 00:05:12.960 Because they wanted to show bad to good, left to right. 121 00:05:12.960 --> 00:05:16.560 So you gotta be thoughtful about this stuff, right? 122 00:05:16.560 --> 00:05:19.159 Is this a, first of all is this a big 123 00:05:19.159 --> 00:05:20.477 or small number, right? 124 00:05:20.477 --> 00:05:21.570 Compared to what conversation? 125 00:05:21.570 --> 00:05:24.120 And how am I gonna depict that? 126 00:05:24.120 --> 00:05:25.920 Be very thoughtful 127 00:05:25.920 --> 00:05:28.380 and intentional about decisions like this. 128 00:05:28.380 --> 00:05:30.810 And by the way, the reason time flows down in this case 129 00:05:30.810 --> 00:05:33.180 and not left to right is because they use this 130 00:05:33.180 --> 00:05:35.490 as a mechanism in the navigation. 131 00:05:35.490 --> 00:05:37.927 As you scroll down, the story sort of scrolls down, 132 00:05:37.927 --> 00:05:40.260 you know, and it would take too much horizontal space 133 00:05:40.260 --> 00:05:41.093 to do it that way. 134 00:05:41.093 --> 00:05:42.510 Not to mention you're scrolling this way. 135 00:05:42.510 --> 00:05:45.852 So again, very intentional decision to do it that way, 136 00:05:45.852 --> 00:05:48.420 but it sort of breaks some best practices 137 00:05:48.420 --> 00:05:51.660 but they did it appropriately I believe. 138 00:05:51.660 --> 00:05:55.950 Long story short, good and bad, up and down, 139 00:05:55.950 --> 00:05:57.690 left and right, whatever, 140 00:05:57.690 --> 00:06:00.000 time generally flows left to right, 141 00:06:00.000 --> 00:06:03.630 but you can break that with intention okay? 142 00:06:03.630 --> 00:06:08.375 Alright so, another consideration is time. 143 00:06:08.375 --> 00:06:10.320 Not only is it available, 144 00:06:10.320 --> 00:06:15.320 do I have data with time as a component, but does it matter? 145 00:06:15.780 --> 00:06:18.265 Should I be showing that to my audience? 146 00:06:18.265 --> 00:06:20.460 Not always, right? 147 00:06:20.460 --> 00:06:23.460 I may have had, you know, a hundred years of data 148 00:06:23.460 --> 00:06:25.467 about the Olympics for that data story, 149 00:06:25.467 --> 00:06:27.270 but I could have said 150 00:06:27.270 --> 00:06:30.300 just look at the most recent Olympics. 151 00:06:30.300 --> 00:06:32.820 I could have just honed in on the most recent Olympics. 152 00:06:32.820 --> 00:06:35.220 In fact I did end up honing in just on the most recent 153 00:06:35.220 --> 00:06:36.270 40 years of Olympics. 154 00:06:36.270 --> 00:06:38.100 I didn't use the entire time series, 155 00:06:38.100 --> 00:06:40.203 and I didn't really show, 156 00:06:41.559 --> 00:06:42.450 in fact I don't think I showed time at all, right? 157 00:06:42.450 --> 00:06:45.330 I didn't show how it progressed over time. 158 00:06:45.330 --> 00:06:46.800 I actually did have one hover effect 159 00:06:46.800 --> 00:06:48.762 which I don't think I showed you all where I do show that, 160 00:06:48.762 --> 00:06:51.300 but that wasn't the point of my data story. 161 00:06:51.300 --> 00:06:55.980 I used time, a timed data set to explore. 162 00:06:55.980 --> 00:06:58.710 But the data story was not about showing time. 163 00:06:58.710 --> 00:07:00.780 But if it is obviously you're gonna choose charts 164 00:07:00.780 --> 00:07:01.653 that enable that. 165 00:07:02.520 --> 00:07:07.500 One of the most important and common questions 166 00:07:07.500 --> 00:07:11.460 you should be asking when visualizing any data is 167 00:07:11.460 --> 00:07:15.120 whether the numbers themselves tell the story, 168 00:07:15.120 --> 00:07:18.510 the actual values, or if he should be converting those 169 00:07:18.510 --> 00:07:22.770 values into a rate okay? 170 00:07:22.770 --> 00:07:25.650 As an example, I did this story a few years back 171 00:07:25.650 --> 00:07:27.867 looking at minimum wage. 172 00:07:27.867 --> 00:07:31.920 What can you buy with minimum wage over time? 173 00:07:31.920 --> 00:07:35.970 So 1990, minimum wage was whatever it was, 174 00:07:35.970 --> 00:07:38.220 over time it went up and it sort of jumps up at it, 175 00:07:38.220 --> 00:07:40.350 you know that's why it's sort of this stepwise diagram 176 00:07:40.350 --> 00:07:42.960 where it's not like a diagonal line, 177 00:07:42.960 --> 00:07:45.030 and it jumps up and it gets to this point, okay? 178 00:07:45.030 --> 00:07:46.470 That's the minimum wage. 179 00:07:46.470 --> 00:07:49.410 The question is how many eggs could I buy 180 00:07:49.410 --> 00:07:51.630 over that same time period? 181 00:07:51.630 --> 00:07:54.870 So now we're looking at the price of a dozen eggs. 182 00:07:54.870 --> 00:07:59.870 In 1990 a dozen eggs cost 88 cents, or sorry, 1980. 183 00:08:01.470 --> 00:08:05.790 And by 2012, a dozen eggs cost $1.94. 184 00:08:05.790 --> 00:08:08.760 Okay, that's the value, that's the actual number. 185 00:08:08.760 --> 00:08:11.787 But the question is for this data story, 186 00:08:11.787 --> 00:08:14.790 what could I buy with minimum wage? 187 00:08:14.790 --> 00:08:17.670 So I have to convert it into a rate, 188 00:08:17.670 --> 00:08:20.040 a ratio of essentially the price of eggs 189 00:08:20.040 --> 00:08:21.964 divided by minimum wage or vice versa. 190 00:08:21.964 --> 00:08:25.980 In that case, the price of eggs actually only went up by 6%. 191 00:08:25.980 --> 00:08:29.580 Right, the real number, price of eggs went up 221%. 192 00:08:29.580 --> 00:08:33.000 But when I compare it to minimum wage, it barely went up. 193 00:08:33.000 --> 00:08:35.280 Electricity went actually down. 194 00:08:35.280 --> 00:08:38.763 You can buy more electricity today with minimum wage 195 00:08:38.763 --> 00:08:41.340 then you could 40 years ago, 196 00:08:41.340 --> 00:08:44.820 or in that case 32 years ago or whatever it was. 197 00:08:44.820 --> 00:08:47.970 So converting numbers into rates right? 198 00:08:47.970 --> 00:08:52.590 This is why we convert numbers into percentages, per capita, 199 00:08:52.590 --> 00:08:54.330 per 1,000, per 100,000, 200 00:08:54.330 --> 00:08:56.940 because the numbers themselves are useless, right? 201 00:08:56.940 --> 00:08:59.190 Easiest example you can remember from your own lives. 202 00:08:59.190 --> 00:09:01.350 Covid, right? 203 00:09:01.350 --> 00:09:04.680 What is the rate of Covid infection? 204 00:09:04.680 --> 00:09:06.810 You know, if you said well, you know this, 205 00:09:06.810 --> 00:09:09.979 Rhode Island has a hundred thousand infections 206 00:09:09.979 --> 00:09:12.480 and California has 150,000, 207 00:09:12.480 --> 00:09:14.130 oh no California is a big trouble. 208 00:09:14.130 --> 00:09:15.150 No they're not. 209 00:09:15.150 --> 00:09:17.550 California is like 20 times the size of Rhode Island 210 00:09:17.550 --> 00:09:19.260 or maybe even more, probably more. 211 00:09:19.260 --> 00:09:21.798 So it's very different. 212 00:09:21.798 --> 00:09:23.400 You have to convert it into a per 100,000 rate 213 00:09:23.400 --> 00:09:25.800 in order to understand what the numbers really mean. 214 00:09:25.800 --> 00:09:26.820 Okay. 215 00:09:26.820 --> 00:09:29.220 Other considerations are the scales. 216 00:09:29.220 --> 00:09:32.550 What are the scales of your chart and how do you set them? 217 00:09:32.550 --> 00:09:33.870 How do you use them? 218 00:09:33.870 --> 00:09:36.210 We're gonna talk more about that later on. 219 00:09:36.210 --> 00:09:39.060 Also, very important, especially for those of you who are 220 00:09:39.060 --> 00:09:44.060 in science or any discipline where your measurements 221 00:09:44.400 --> 00:09:46.980 include uncertainty which you know, let's be honest, 222 00:09:46.980 --> 00:09:49.620 almost all data analytics includes uncertainty. 223 00:09:49.620 --> 00:09:52.260 Sometimes it's really important to communicate 224 00:09:52.260 --> 00:09:54.030 that uncertainty to your audience. 225 00:09:54.030 --> 00:09:56.970 As an example, we're looking at what's really warming 226 00:09:56.970 --> 00:09:58.560 the world from Bloomberg. 227 00:09:58.560 --> 00:10:00.360 Great data story. 228 00:10:00.360 --> 00:10:02.730 They're showing us the observed temperature over time. 229 00:10:02.730 --> 00:10:03.750 This is a real number. 230 00:10:03.750 --> 00:10:05.520 So we don't need to show uncertainty here. 231 00:10:05.520 --> 00:10:07.370 This is actually based on thermometers okay, 232 00:10:07.370 --> 00:10:08.910 so we're good with that one. 233 00:10:08.910 --> 00:10:11.910 But this one is aimed at climate skeptics, saying alright, 234 00:10:11.910 --> 00:10:15.423 for you skeptics, you probably might say something like, 235 00:10:17.070 --> 00:10:19.590 maybe it's all about the orbital changes of the earth. 236 00:10:19.590 --> 00:10:23.100 Maybe that's what changed that observed temperature. 237 00:10:23.100 --> 00:10:28.050 Well, in fact no, here is an average of orbital changes 238 00:10:28.050 --> 00:10:29.280 based on our computer models 239 00:10:29.280 --> 00:10:32.790 with what the effect that would have on earth temperature. 240 00:10:32.790 --> 00:10:36.000 But they don't just show the line, which feels very precise. 241 00:10:36.000 --> 00:10:37.263 They show the band. 242 00:10:38.508 --> 00:10:41.670 This is the 95% confidence interval for that value. 243 00:10:41.670 --> 00:10:44.010 And they do this again and again and again. 244 00:10:44.010 --> 00:10:47.340 This is the effect of the sun 245 00:10:47.340 --> 00:10:49.470 that we would expect it to have on climate. 246 00:10:49.470 --> 00:10:50.760 It could have some effect, 247 00:10:50.760 --> 00:10:52.320 or might be literally have a negative effect. 248 00:10:52.320 --> 00:10:56.070 Long story short, this is the range of uncertainty. 249 00:10:56.070 --> 00:10:59.040 And so this whole data story is like this. 250 00:10:59.040 --> 00:11:01.050 And the thing about this is it shows, 251 00:11:01.050 --> 00:11:02.820 it's transparent, right? 252 00:11:02.820 --> 00:11:04.740 And allows people who are gonna nitpick the data 253 00:11:04.740 --> 00:11:06.240 and say well, this didn't happen. 254 00:11:06.240 --> 00:11:07.380 This number isn't real. 255 00:11:07.380 --> 00:11:08.965 Yeah, of course not. 256 00:11:08.965 --> 00:11:10.050 There's this range of possibilities. 257 00:11:10.050 --> 00:11:12.946 Let's acknowledge that in our visuals. 258 00:11:12.946 --> 00:11:17.850 It's also very helpful, very important sometimes to think 259 00:11:17.850 --> 00:11:22.850 about whether our data can be compared side by side. 260 00:11:23.490 --> 00:11:26.970 Again, using Covid as the example, New York City. 261 00:11:26.970 --> 00:11:29.340 Their first infection happened in March 262 00:11:29.340 --> 00:11:31.710 or whenever it was of 2020, 263 00:11:31.710 --> 00:11:33.952 and then it went, skyrocketed, right? 264 00:11:33.952 --> 00:11:36.013 Idaho or Montana, 265 00:11:36.013 --> 00:11:38.790 their infections didn't start until months later. 266 00:11:38.790 --> 00:11:41.340 Or at least it didn't get bad until months later. 267 00:11:41.340 --> 00:11:43.530 So if we looked at them on a time series 268 00:11:43.530 --> 00:11:45.150 and then we had these two lines going like this 269 00:11:45.150 --> 00:11:47.336 but they're months apart, it's hard to compare them. 270 00:11:47.336 --> 00:11:50.100 So what the New York Times and a lot of others did, 271 00:11:50.100 --> 00:11:51.930 Financial Times, many others, 272 00:11:51.930 --> 00:11:54.690 is they realized of course no, we have to index these. 273 00:11:54.690 --> 00:11:57.060 Instead, instead of having their charts have a scale 274 00:11:57.060 --> 00:11:58.770 based on time, 275 00:11:58.770 --> 00:12:01.140 they had all of their charts on a scale based on 276 00:12:01.140 --> 00:12:02.940 here's day one of infection, 277 00:12:02.940 --> 00:12:04.950 that's the leftmost part in the chart. 278 00:12:04.950 --> 00:12:07.170 And then all of the lines start in the same place. 279 00:12:07.170 --> 00:12:09.599 So even though this happened in March, 280 00:12:09.599 --> 00:12:10.623 this happened in June, 281 00:12:10.623 --> 00:12:11.760 we can put 'em side by side because it's all about 282 00:12:11.760 --> 00:12:13.980 how many days since the first infection 283 00:12:13.980 --> 00:12:16.110 and what do their infection rates look like. 284 00:12:16.110 --> 00:12:18.210 It's not always appropriate to do it that way, 285 00:12:18.210 --> 00:12:21.210 but it's very appropriate for those purposes. 286 00:12:21.210 --> 00:12:24.300 Another great lesson proven by research. 287 00:12:24.300 --> 00:12:26.670 You should feel free to use a bunch of small charts. 288 00:12:26.670 --> 00:12:29.460 I showed this one maybe in the previous video I think. 289 00:12:29.460 --> 00:12:31.963 This is the Wall Street Journal weather page. 290 00:12:31.963 --> 00:12:36.346 Now what we have here is each city 291 00:12:36.346 --> 00:12:38.518 and there's like 20 maybe more of 'em. 292 00:12:38.518 --> 00:12:39.780 We have 10 days of weather. 293 00:12:39.780 --> 00:12:43.410 We have the record high and low, the normal high and low, 294 00:12:43.410 --> 00:12:47.400 the actual high and low, and or the forecast high and low. 295 00:12:47.400 --> 00:12:49.260 And the forecast has a range. 296 00:12:49.260 --> 00:12:52.620 So first of all, crazy awesome distribution diagrams, 297 00:12:52.620 --> 00:12:55.590 and for financial, for Wall Street Journal readers, 298 00:12:55.590 --> 00:12:56.910 you know, they'll appreciate this. 299 00:12:56.910 --> 00:12:59.282 This would be terrible for USA Today right? 300 00:12:59.282 --> 00:13:03.660 We have thousands of data points here, okay? 301 00:13:03.660 --> 00:13:05.100 Thousands of them. 302 00:13:05.100 --> 00:13:06.870 Now, if I tried to do this in one chart 303 00:13:06.870 --> 00:13:09.210 and show you Atlanta on top of Boston on top of Chicago, 304 00:13:09.210 --> 00:13:11.507 this would be a ridiculously, like you couldn't even do it. 305 00:13:11.507 --> 00:13:14.160 But we know from research if you show people a bunch 306 00:13:14.160 --> 00:13:16.590 of small charts, they're fine at that. 307 00:13:16.590 --> 00:13:18.120 We're good at that. 308 00:13:18.120 --> 00:13:19.950 So it's okay to use small multiples. 309 00:13:19.950 --> 00:13:20.783 Now you may say in this case 310 00:13:20.783 --> 00:13:22.620 well I don't need to compare Atlanta to Detroit 311 00:13:22.620 --> 00:13:24.824 'cause I either live in one or the other. 312 00:13:24.824 --> 00:13:25.657 That's all I care about. 313 00:13:25.657 --> 00:13:26.490 Sure, fair enough. 314 00:13:26.490 --> 00:13:27.690 Maybe this isn't the best example. 315 00:13:27.690 --> 00:13:29.220 Let's look at this one. 316 00:13:29.220 --> 00:13:31.290 This is looking at the over representation 317 00:13:31.290 --> 00:13:36.290 or the under representation of different states in terms 318 00:13:36.840 --> 00:13:39.420 of their voting power essentially. 319 00:13:39.420 --> 00:13:41.880 And so here I probably do want to compare some 320 00:13:41.880 --> 00:13:43.278 of these states. 321 00:13:43.278 --> 00:13:45.000 You know, California is grossly underrepresented. 322 00:13:45.000 --> 00:13:47.310 It's a very large population state, 323 00:13:47.310 --> 00:13:49.260 but because of the way district between works 324 00:13:49.260 --> 00:13:51.000 and a variety of other things, 325 00:13:51.000 --> 00:13:52.710 they don't get as much power in Congress compared 326 00:13:52.710 --> 00:13:55.440 to that population size as some other states. 327 00:13:55.440 --> 00:13:58.050 Montana is overrepresented, okay? 328 00:13:58.050 --> 00:14:01.664 So I can easily compare these two to each other, 329 00:14:01.664 --> 00:14:03.480 even though it's a bunch of small charts. 330 00:14:03.480 --> 00:14:05.760 Now obviously in this particular case 331 00:14:05.760 --> 00:14:07.770 you all will appreciate this one, especially the GIS people. 332 00:14:07.770 --> 00:14:09.600 Not only is this small multiples, 333 00:14:09.600 --> 00:14:12.128 but this is essentially a map, right? 334 00:14:12.128 --> 00:14:14.130 This is, this is essentially, Maine is in the right place, 335 00:14:14.130 --> 00:14:15.280 Vermont, New Hampshire. 336 00:14:16.145 --> 00:14:17.940 I can read this like a map, 337 00:14:17.940 --> 00:14:19.530 but it's also a series of charts. 338 00:14:19.530 --> 00:14:21.000 I love this example. 339 00:14:21.000 --> 00:14:22.530 Okay, and last but not least, 340 00:14:22.530 --> 00:14:24.870 in terms of these other considerations, 341 00:14:24.870 --> 00:14:27.030 you should always think about whether you want to 342 00:14:27.030 --> 00:14:29.909 build your charts over time. 343 00:14:29.909 --> 00:14:34.909 Complex charts can be complex, hard to digest, 344 00:14:35.505 --> 00:14:39.600 particularly for a less informed audience. 345 00:14:39.600 --> 00:14:42.960 But you can build to a very complex chart 346 00:14:42.960 --> 00:14:45.960 by stepping people through it one bit at a time. 347 00:14:45.960 --> 00:14:48.540 So as an example that's sort of that, 348 00:14:48.540 --> 00:14:51.090 this is a project that I created called, you know, 349 00:14:51.090 --> 00:14:53.640 trends in US occupations over time. 350 00:14:53.640 --> 00:14:55.620 And I was looking at, this is a bump chart. 351 00:14:55.620 --> 00:14:58.950 We talked about that earlier, or just a rank position chart. 352 00:14:58.950 --> 00:15:01.110 Retail salespersons was the number one job 353 00:15:01.110 --> 00:15:03.540 for all 20 years in the data set. 354 00:15:03.540 --> 00:15:06.540 Cashiers, number two job for most of the time, 355 00:15:06.540 --> 00:15:08.621 then dropped to number three 356 00:15:08.621 --> 00:15:09.454 for the last few years, et cetera. 357 00:15:09.454 --> 00:15:11.160 So this is a fairly complex chart 358 00:15:11.160 --> 00:15:12.480 and this is sort of like a build. 359 00:15:12.480 --> 00:15:14.930 But essentially as I walk through the data story, 360 00:15:16.750 --> 00:15:18.240 I highlight the different parts of the chart 361 00:15:18.240 --> 00:15:21.450 to reveal to you visually what I'm talking about 362 00:15:21.450 --> 00:15:24.060 in the text going by on the left hand side. 363 00:15:24.060 --> 00:15:25.920 So this is slightly different than a build, 364 00:15:25.920 --> 00:15:28.920 but very similar idea that as I'm telling this story, 365 00:15:28.920 --> 00:15:30.210 I'm gonna visually walk you through 366 00:15:30.210 --> 00:15:31.530 those parts of the story. 367 00:15:31.530 --> 00:15:33.990 I could have literally turned off all the rest of the chart 368 00:15:33.990 --> 00:15:36.087 and literally just built it bit by bit by bit. 369 00:15:36.087 --> 00:15:40.740 Either way, I'm using the visual and the text. 370 00:15:40.740 --> 00:15:43.290 I'm sort of combining the two to really make it easier 371 00:15:43.290 --> 00:15:45.149 for you to follow along. 372 00:15:45.149 --> 00:15:48.300 Listen, the most important thing I can tell you 373 00:15:48.300 --> 00:15:51.726 about chart selection is that it's all about intentionality. 374 00:15:51.726 --> 00:15:55.770 You have to have a reason to do every single thing 375 00:15:55.770 --> 00:15:56.670 that you're doing. 376 00:15:57.922 --> 00:15:58.920 We should not mindlessly pick charts 377 00:15:58.920 --> 00:16:00.810 by clicking buttons in Excel. 378 00:16:00.810 --> 00:16:03.270 We should pick charts for a reason 379 00:16:03.270 --> 00:16:05.129 based on all these questions that we're asking, 380 00:16:05.129 --> 00:16:07.770 based on the tasks we're trying to enable, 381 00:16:07.770 --> 00:16:08.910 et cetera, et cetera. 382 00:16:08.910 --> 00:16:12.960 So pick charts with intention and yes, 383 00:16:12.960 --> 00:16:14.670 think about how you're gonna label 384 00:16:14.670 --> 00:16:17.340 and annotate your chart to enable your audience 385 00:16:17.340 --> 00:16:19.080 to do whatever you need them to do. 386 00:16:19.080 --> 00:16:20.970 You want to declutter and focus, 387 00:16:20.970 --> 00:16:24.180 get rid of all the junk so your audience can focus in 388 00:16:24.180 --> 00:16:27.566 on the parts that they really should be focusing in on. 389 00:16:27.566 --> 00:16:31.530 Then test it, like I said earlier, test the chart, 390 00:16:31.530 --> 00:16:32.490 run it past somebody. 391 00:16:32.490 --> 00:16:34.080 Start with to them and say hey, 392 00:16:34.080 --> 00:16:36.279 what did you learn from this? 393 00:16:36.279 --> 00:16:37.112 What do you understand? 394 00:16:37.112 --> 00:16:38.619 What don't you understand? 395 00:16:38.619 --> 00:16:39.452 Are you confused about anything? 396 00:16:39.452 --> 00:16:41.790 They'll answer you like humans will, 397 00:16:41.790 --> 00:16:43.320 and then you can use that information 398 00:16:43.320 --> 00:16:45.003 to improve your design.