WEBVTT 1 00:00:00.060 --> 00:00:02.850 All right, a little bit more about chart selection here. 2 00:00:02.850 --> 00:00:06.810 I wanna talk about some specific like bits of research 3 00:00:06.810 --> 00:00:08.520 that can guide some of our decisions 4 00:00:08.520 --> 00:00:10.980 as we're picking charts, okay? 5 00:00:10.980 --> 00:00:15.810 Number one, this is a good thing to be aware of, 6 00:00:15.810 --> 00:00:18.360 and it's not just about line charts which it says here. 7 00:00:18.360 --> 00:00:19.950 That was what the study was about, but it actually, 8 00:00:19.950 --> 00:00:22.290 it applies to line charts, scatter plots 9 00:00:22.290 --> 00:00:24.300 and probably other charts as well. 10 00:00:24.300 --> 00:00:28.740 It is a good best practice that the independent variable, 11 00:00:28.740 --> 00:00:33.000 the cause of something always goes on the x axis, 12 00:00:33.000 --> 00:00:34.980 the horizontal axis, 13 00:00:34.980 --> 00:00:39.658 and that the effect, the result, goes on the Y axis. 14 00:00:39.658 --> 00:00:43.560 So that's why time always flows left to right. 15 00:00:43.560 --> 00:00:48.318 Time happens, and as time goes by, I lose more hair, right? 16 00:00:48.318 --> 00:00:51.480 The cause is the time, not the other way around. 17 00:00:51.480 --> 00:00:53.040 It's not that my hair disappears, 18 00:00:53.040 --> 00:00:54.960 and then therefore time happens, right? 19 00:00:54.960 --> 00:00:58.290 So cause down here, effect going up and down. 20 00:00:58.290 --> 00:01:03.290 Good, but this research found there's an exception. 21 00:01:03.510 --> 00:01:05.520 You would never do it that way 22 00:01:05.520 --> 00:01:10.520 if the cause was about highness, bigness, more-ness 23 00:01:12.660 --> 00:01:14.280 you would never say 24 00:01:14.280 --> 00:01:17.610 the higher I go on the mountain horizontally, 25 00:01:17.610 --> 00:01:19.470 the colder it gets vertically. 26 00:01:19.470 --> 00:01:20.580 That's just weird. 27 00:01:20.580 --> 00:01:21.900 And this research proved it. 28 00:01:21.900 --> 00:01:24.720 People are confused, disoriented by that. 29 00:01:24.720 --> 00:01:25.670 So this also aligns 30 00:01:26.829 --> 00:01:27.662 with what we've talked about a couple of times already here. 31 00:01:27.662 --> 00:01:30.570 That's another reason why low numbers go down here. 32 00:01:30.570 --> 00:01:31.800 High numbers go up here. 33 00:01:31.800 --> 00:01:33.960 We always go low to high that way. 34 00:01:33.960 --> 00:01:35.010 And the same thing left to right 35 00:01:35.010 --> 00:01:37.080 because time flows left to right. 36 00:01:37.080 --> 00:01:39.210 Okay, so that's a good one to know. 37 00:01:39.210 --> 00:01:41.572 This one I mentioned just in the last video I believe. 38 00:01:41.572 --> 00:01:44.790 When we see, or maybe in the prior video, 39 00:01:44.790 --> 00:01:49.790 when we see bars, we instinctively compare them, 40 00:01:49.890 --> 00:01:51.600 research has shown. 41 00:01:51.600 --> 00:01:53.820 So if you want your audience to compare 42 00:01:53.820 --> 00:01:56.760 those month to month values, sure, maybe you can pick bars. 43 00:01:56.760 --> 00:01:58.530 If you want them to think about the overall trend, 44 00:01:58.530 --> 00:02:00.540 show them a line. 45 00:02:00.540 --> 00:02:01.500 Simple. 46 00:02:01.500 --> 00:02:02.430 I also mentioned this one. 47 00:02:02.430 --> 00:02:04.830 Correlation, you cannot beat a scatter plot 48 00:02:04.830 --> 00:02:06.240 to show correlation. 49 00:02:06.240 --> 00:02:08.190 So not that there aren't other charts. 50 00:02:08.190 --> 00:02:10.453 There's one called parallel sets that also works, 51 00:02:10.453 --> 00:02:14.967 parallel coordinates, sorry, but that's a weird chart type. 52 00:02:14.967 --> 00:02:16.560 If you're gonna show correlation, 53 00:02:16.560 --> 00:02:18.090 you're probably gonna use a scatter plot. 54 00:02:18.090 --> 00:02:19.350 There are others, like I said. 55 00:02:19.350 --> 00:02:20.370 You can actually do, 56 00:02:20.370 --> 00:02:21.960 essentially a map can show the correlation 57 00:02:21.960 --> 00:02:24.315 between geography and another variable obviously. 58 00:02:24.315 --> 00:02:26.220 But, whatever. 59 00:02:26.220 --> 00:02:29.400 Scatter plot's great for just numeric correlation. 60 00:02:29.400 --> 00:02:31.560 Okay, this one. 61 00:02:31.560 --> 00:02:33.360 We talked about small multiples in the last video, 62 00:02:33.360 --> 00:02:36.060 the idea that you definitely can, research has shown, 63 00:02:36.060 --> 00:02:37.860 and this being one of the research studies, 64 00:02:37.860 --> 00:02:39.780 show people small multiples, 65 00:02:39.780 --> 00:02:40.680 and they'll be very accurate at comparing 66 00:02:40.680 --> 00:02:42.450 those different little charts. 67 00:02:42.450 --> 00:02:43.830 What's interesting about this one 68 00:02:43.830 --> 00:02:48.450 is they compared small multiples to animation 69 00:02:48.450 --> 00:02:52.920 and to essentially traced static graphics. 70 00:02:52.920 --> 00:02:56.010 And as you can see here, the traced static graphics, 71 00:02:56.010 --> 00:02:58.200 especially when they're all piled on top of each other, 72 00:02:58.200 --> 00:02:59.656 not very effective. 73 00:02:59.656 --> 00:03:02.520 Animation, people loved it. 74 00:03:02.520 --> 00:03:04.170 And that's worth something. 75 00:03:04.170 --> 00:03:05.490 Engagement matters. 76 00:03:05.490 --> 00:03:07.115 I use animation all the time, 77 00:03:07.115 --> 00:03:09.990 but if I really want my audience 78 00:03:09.990 --> 00:03:14.160 to very accurately compare two things to each other 79 00:03:14.160 --> 00:03:16.200 as they've changed over time or whatever, 80 00:03:16.200 --> 00:03:18.420 I might use small multiples instead. 81 00:03:18.420 --> 00:03:21.810 So you know, be cautious, and again, try it. 82 00:03:21.810 --> 00:03:23.430 Create an animation if you have time. 83 00:03:23.430 --> 00:03:25.211 Create the animation 84 00:03:25.211 --> 00:03:26.070 and create the the small multiples 85 00:03:26.070 --> 00:03:27.360 and throw it past somebody. 86 00:03:27.360 --> 00:03:29.250 Ask them what they think. 87 00:03:29.250 --> 00:03:30.240 Aesthetics matter. 88 00:03:30.240 --> 00:03:32.580 This is related sort of to what we were just talking about. 89 00:03:32.580 --> 00:03:34.440 Engagement matters, okay? 90 00:03:34.440 --> 00:03:36.990 I'm gonna make sexy eye candy anytime I can 91 00:03:36.990 --> 00:03:38.010 because that does matter. 92 00:03:38.010 --> 00:03:40.410 It's gonna attract my audience and keep them engaged. 93 00:03:40.410 --> 00:03:41.640 Very important. 94 00:03:41.640 --> 00:03:43.320 Also, aesthetics. 95 00:03:43.320 --> 00:03:44.490 Very important. 96 00:03:44.490 --> 00:03:48.540 This research, they took a bunch of hierarchical data, 97 00:03:48.540 --> 00:03:50.194 you know, parents, children 98 00:03:50.194 --> 00:03:51.750 and children of children type of data, 99 00:03:51.750 --> 00:03:54.480 and they visualized it a whole bunch of different ways. 100 00:03:54.480 --> 00:03:56.430 They used adeno dendrogram tree 101 00:03:56.430 --> 00:03:58.530 which is like your typical org chart. 102 00:03:58.530 --> 00:04:02.460 This one performed very well, no surprise. 103 00:04:02.460 --> 00:04:04.950 Typical way of seeing hierarchical data. 104 00:04:04.950 --> 00:04:07.590 They also visualized it in a whole bunch of other ways 105 00:04:07.590 --> 00:04:11.340 including this crazy weird 3D cylinder thing. 106 00:04:11.340 --> 00:04:13.954 I don't know what that is, but whatever, they tried it, 107 00:04:13.954 --> 00:04:17.820 and the dendrogram tree performed really well, the best. 108 00:04:17.820 --> 00:04:20.580 Okay, and by the way, performance in data vis research 109 00:04:20.580 --> 00:04:25.230 is usually speed to understanding, accuracy of understanding 110 00:04:25.230 --> 00:04:28.143 and accuracy of memory of the data, okay? 111 00:04:29.580 --> 00:04:33.060 The other chart in this entire group of options 112 00:04:33.060 --> 00:04:36.420 that performed pretty much exactly as well 113 00:04:36.420 --> 00:04:41.070 as the dendrogram tree, the sunburst diagram. 114 00:04:41.070 --> 00:04:42.420 Remember a couple videos ago 115 00:04:42.420 --> 00:04:43.890 I whined about circular shapes 116 00:04:43.890 --> 00:04:46.200 and how badly they performed for a while. 117 00:04:46.200 --> 00:04:48.450 Turns out, I mean that's true, 118 00:04:48.450 --> 00:04:52.023 but it also turns out that it's not always true. 119 00:04:52.950 --> 00:04:56.550 In this research, they also ask people, what do you like? 120 00:04:56.550 --> 00:04:58.500 Guess which one they loved the most? 121 00:04:58.500 --> 00:04:59.373 The sunburst. 122 00:05:00.300 --> 00:05:02.765 So turns out if it's pretty, 123 00:05:02.765 --> 00:05:04.890 people will spend more time with it. 124 00:05:04.890 --> 00:05:07.170 They will learn how to read it, 125 00:05:07.170 --> 00:05:09.330 and it does not affect performance. 126 00:05:09.330 --> 00:05:11.010 Okay, or it doesn't degrade performance. 127 00:05:11.010 --> 00:05:13.200 In fact, I guess you could say it helps performance 128 00:05:13.200 --> 00:05:16.650 cause the sunburst shouldn't perform well on paper. 129 00:05:16.650 --> 00:05:17.970 So aesthetics matter. 130 00:05:17.970 --> 00:05:19.980 Make pretty stuff. 131 00:05:19.980 --> 00:05:21.000 All right. 132 00:05:21.000 --> 00:05:22.350 I mentioned this one 133 00:05:22.350 --> 00:05:23.790 towards the beginning of the course already. 134 00:05:23.790 --> 00:05:25.290 I'll say it again. 135 00:05:25.290 --> 00:05:27.300 Titles matter. 136 00:05:27.300 --> 00:05:29.550 People do fixate on the title, 137 00:05:29.550 --> 00:05:33.690 and if the title actually says something about your data, 138 00:05:33.690 --> 00:05:35.250 your audience is more likely 139 00:05:35.250 --> 00:05:37.320 to accurately remember your data. 140 00:05:37.320 --> 00:05:41.280 So boy, in this course you will definitely be critiqued 141 00:05:41.280 --> 00:05:43.800 if you don't have good titles on your slides 142 00:05:43.800 --> 00:05:45.420 and for your overall story, et cetera. 143 00:05:45.420 --> 00:05:48.150 It's really important, especially your slide titles, okay? 144 00:05:48.150 --> 00:05:50.400 And by the way, we don't need to have redundant titles. 145 00:05:50.400 --> 00:05:52.650 I don't need to have a slide title that says one thing 146 00:05:52.650 --> 00:05:55.260 and then a chart title that says kind of the same thing 147 00:05:55.260 --> 00:05:56.820 or worst case, like it just says nothing 148 00:05:56.820 --> 00:05:58.200 like survey results. 149 00:05:58.200 --> 00:05:59.220 Just skip the chart titles. 150 00:05:59.220 --> 00:06:01.740 The slide title can essentially act as your chart title, 151 00:06:01.740 --> 00:06:03.093 but say something with it. 152 00:06:03.930 --> 00:06:08.580 Also, we should use semantically resonant colors 153 00:06:08.580 --> 00:06:11.010 when relevant. 154 00:06:11.010 --> 00:06:12.090 What does that mean? 155 00:06:12.090 --> 00:06:13.080 Well, look at this. 156 00:06:13.080 --> 00:06:17.486 We have apples, bananas, and blueberries. 157 00:06:17.486 --> 00:06:20.250 It's hard for me to say that 158 00:06:20.250 --> 00:06:22.270 cause the colors are wrong, right? 159 00:06:22.270 --> 00:06:24.430 The right colors would be apples, bananas 160 00:06:25.723 --> 00:06:26.556 and blueberries, right? 161 00:06:27.425 --> 00:06:29.250 So there's a measurable cognitive load 162 00:06:29.250 --> 00:06:31.410 when the colors don't match the thing. 163 00:06:31.410 --> 00:06:34.980 So we should use the right color when relevant, 164 00:06:34.980 --> 00:06:37.470 and I would add when appropriate, right? 165 00:06:37.470 --> 00:06:39.150 So we're not gonna do that with gender, 166 00:06:39.150 --> 00:06:40.380 blue and pink, yay! 167 00:06:40.380 --> 00:06:44.190 No, we're not gonna do it with ethnicity, right? 168 00:06:44.190 --> 00:06:46.830 But when appropriate and relevant 169 00:06:46.830 --> 00:06:47.910 make the color match a thing. 170 00:06:47.910 --> 00:06:49.740 And I think I talked about that earlier 171 00:06:49.740 --> 00:06:50.970 a couple modules ago. 172 00:06:50.970 --> 00:06:53.740 You know, remember when I said purple is 173 00:06:54.783 --> 00:06:55.616 something other than a purple thing? 174 00:06:55.616 --> 00:06:57.300 Instead of saying purple is eggplant right? 175 00:06:57.300 --> 00:07:00.660 Okay, also colorblindness. 176 00:07:00.660 --> 00:07:02.550 Last but not least here today, 177 00:07:02.550 --> 00:07:07.283 colorblindness affects up to 8% of white males, 178 00:07:08.280 --> 00:07:09.870 which is why we talk about it 179 00:07:09.870 --> 00:07:12.240 cause there's a little bit of bias in the industry 180 00:07:12.240 --> 00:07:14.224 and has been for generations obviously, 181 00:07:14.224 --> 00:07:16.767 but it is up to 3% of the population overall. 182 00:07:16.767 --> 00:07:21.663 So it is a not insignificant number of people. 183 00:07:22.590 --> 00:07:25.680 So the most common form of colorblindness 184 00:07:25.680 --> 00:07:28.080 is called red green colorblindness. 185 00:07:28.080 --> 00:07:31.020 Those people have difficulty distinguishing 186 00:07:31.020 --> 00:07:33.810 between shades of red and green. 187 00:07:33.810 --> 00:07:36.060 As an example, this was run through 188 00:07:36.060 --> 00:07:38.430 three different colorblindness filters. 189 00:07:38.430 --> 00:07:40.080 This is the red green filter. 190 00:07:40.080 --> 00:07:41.930 And clearly this shade of orange 191 00:07:41.930 --> 00:07:46.778 and this shade of green are pretty much indistinguishable. 192 00:07:46.778 --> 00:07:50.580 And by the way, even with a key, that's the idea. 193 00:07:50.580 --> 00:07:53.250 Even if I had a legend, I'm not gonna be able to tell 194 00:07:53.250 --> 00:07:55.410 these apart very easily. 195 00:07:55.410 --> 00:07:57.570 So when you create, let's say, a scatter plot 196 00:07:57.570 --> 00:07:59.400 with two categories of things, and you use red and green, 197 00:07:59.400 --> 00:08:02.580 good and bad, it's gonna be literally impossible 198 00:08:02.580 --> 00:08:05.040 to understand what they are, even with a key. 199 00:08:05.040 --> 00:08:07.680 Now if you dual encoded it with 200 00:08:07.680 --> 00:08:10.890 up and down arrows in addition to the color, okay? 201 00:08:10.890 --> 00:08:12.780 Now the red and green doesn't matter. 202 00:08:12.780 --> 00:08:15.810 And you could also, what we do in data vis, 203 00:08:15.810 --> 00:08:19.050 professionals now, we use red and blue instead. 204 00:08:19.050 --> 00:08:20.040 So don't use red and green. 205 00:08:20.040 --> 00:08:21.210 Use red and blue. 206 00:08:21.210 --> 00:08:23.763 That'll be safe to avoid this problem.