WEBVTT 1 00:00:00.090 --> 00:00:01.290 So I wanna show you an example 2 00:00:01.290 --> 00:00:04.410 of finding the So What in a data set. 3 00:00:04.410 --> 00:00:08.250 And what I'm looking at here is a data set 4 00:00:08.250 --> 00:00:10.560 of Olympic competitors. 5 00:00:10.560 --> 00:00:13.380 And this is from 1896 all the way 6 00:00:13.380 --> 00:00:17.087 through the modern Olympics, every Olympic competitor. 7 00:00:17.087 --> 00:00:19.200 And right now they're sorted by ID. 8 00:00:19.200 --> 00:00:22.200 This is like an ID column in the data set 9 00:00:22.200 --> 00:00:23.640 where this came from. 10 00:00:23.640 --> 00:00:27.000 I've filtered out a certain subset of the data. 11 00:00:27.000 --> 00:00:30.150 I'm not gonna go into why or who they are exactly, 12 00:00:30.150 --> 00:00:31.380 except to say this. 13 00:00:31.380 --> 00:00:33.420 When I was investigating this data, 14 00:00:33.420 --> 00:00:35.520 and you can see I have the person, the ID, 15 00:00:35.520 --> 00:00:39.150 the name of the person, their sex age, height, weight, 16 00:00:39.150 --> 00:00:41.884 team, and like an abbreviation for their country, 17 00:00:41.884 --> 00:00:44.760 which games they competed in, in sports, 18 00:00:44.760 --> 00:00:47.070 and whether they won to medal or not. 19 00:00:47.070 --> 00:00:48.180 And I was looking at this data 20 00:00:48.180 --> 00:00:50.190 and look, there's an age column and this is 21 00:00:50.190 --> 00:00:51.023 what I filtered out. 22 00:00:51.023 --> 00:00:52.020 Essentially, there are some people 23 00:00:52.020 --> 00:00:53.370 who we don't know their age, 24 00:00:53.370 --> 00:00:55.779 so I removed them from the data set. 25 00:00:55.779 --> 00:01:00.660 But if I sort by age in descending order, 26 00:01:00.660 --> 00:01:03.840 oldest to youngest, I found this insight. 27 00:01:03.840 --> 00:01:04.980 Look at this. 28 00:01:04.980 --> 00:01:08.910 The oldest competitor in the Olympics, in history, 29 00:01:08.910 --> 00:01:11.940 was 97 years old. 30 00:01:11.940 --> 00:01:13.110 What? 31 00:01:13.110 --> 00:01:14.580 And there's a 96-year-old, and a bunch 32 00:01:14.580 --> 00:01:16.818 of 80-something-year-olds and seventies, like, 33 00:01:16.818 --> 00:01:19.350 so I just discovered this in the dataset. 34 00:01:19.350 --> 00:01:22.380 So I decided to explore age in the Olympics. 35 00:01:22.380 --> 00:01:24.030 That was like the premise, 36 00:01:24.030 --> 00:01:27.390 like just the category of exploration. 37 00:01:27.390 --> 00:01:29.340 And I, first I sorted. 38 00:01:29.340 --> 00:01:30.930 I did all the things I recommend you do 39 00:01:30.930 --> 00:01:32.100 in data analytics. 40 00:01:32.100 --> 00:01:34.563 Some summary statistics and some sorting. 41 00:01:34.563 --> 00:01:37.560 And what I quickly discovered 42 00:01:37.560 --> 00:01:39.720 and I'm gonna go over to the pivot tables here, 43 00:01:39.720 --> 00:01:43.170 was I looked for distribution of ages, okay? 44 00:01:43.170 --> 00:01:45.180 And listen, there was one 10-year-old, 45 00:01:45.180 --> 00:01:47.190 a bazillion years ago, maybe not a bazillion, 46 00:01:47.190 --> 00:01:48.720 I can't even remember when that one was. 47 00:01:48.720 --> 00:01:50.850 One 11-year-old, a bunch of 12-year-olds, 48 00:01:50.850 --> 00:01:52.170 but as you can see, the distribution 49 00:01:52.170 --> 00:01:54.360 sort of fattens out and you have a bunch 50 00:01:54.360 --> 00:01:55.980 of 20-something-year-olds, 51 00:01:55.980 --> 00:01:58.110 and then it starts to shrink again. 52 00:01:58.110 --> 00:02:00.690 And so essentially by the time you hit your thirties, 53 00:02:00.690 --> 00:02:03.840 mid thirties, the numbers get pretty darn small. 54 00:02:03.840 --> 00:02:06.870 Once you hit 40, they really get small, okay? 55 00:02:06.870 --> 00:02:09.510 And so the basic premise then became, 56 00:02:09.510 --> 00:02:11.700 as someone who is over 50 years old, 57 00:02:11.700 --> 00:02:15.450 I was wondering whether or not, technically, 58 00:02:15.450 --> 00:02:16.920 my Olympic dreams were over. 59 00:02:16.920 --> 00:02:19.590 is there any chance I could ever compete 60 00:02:19.590 --> 00:02:22.530 in the Olympics given that I'm over 50? 61 00:02:22.530 --> 00:02:24.210 So that became the premise. 62 00:02:24.210 --> 00:02:27.030 And I explored this data in a whole bunch of different ways. 63 00:02:27.030 --> 00:02:28.440 I found a lot of interesting insights. 64 00:02:28.440 --> 00:02:30.960 I looked at every single sport that's been around 65 00:02:30.960 --> 00:02:34.710 for at least five Olympics, counted the number 66 00:02:34.710 --> 00:02:37.710 of athletes who competed in each one of those sports. 67 00:02:37.710 --> 00:02:40.350 Found some interesting insights like, golf 68 00:02:40.350 --> 00:02:43.710 and karate haven't been around for five Olympics. 69 00:02:43.710 --> 00:02:45.840 And some other sports had been around, 70 00:02:45.840 --> 00:02:48.040 but maybe just barely for over five. 71 00:02:48.040 --> 00:02:51.180 Some that were around for five, or maybe not five, 72 00:02:51.180 --> 00:02:54.630 I can't remember, in earlier years, but aren't around now. 73 00:02:54.630 --> 00:02:55.950 Long story short, I also realized 74 00:02:55.950 --> 00:02:57.120 that a lot of my data wasn't that good 75 00:02:57.120 --> 00:02:59.310 before 1980, so I sort of narrowed the focus 76 00:02:59.310 --> 00:03:02.251 into the modern age of 1980 and beyond. 77 00:03:02.251 --> 00:03:04.290 Discovered some interesting patterns 78 00:03:04.290 --> 00:03:08.610 when I looked at the average age per sport. 79 00:03:08.610 --> 00:03:12.310 In alpine skiing, the average age is 23 point something. 80 00:03:12.310 --> 00:03:15.450 In like gymnastics, they're very young on average. 81 00:03:15.450 --> 00:03:18.393 Rhythmic gymnastics, they're like 18, 19 years old. 82 00:03:18.393 --> 00:03:21.011 Other sports, higher average age. 83 00:03:21.011 --> 00:03:23.790 I discovered some other insights as I played with the data. 84 00:03:23.790 --> 00:03:27.548 I found some sports where like, team sports, 85 00:03:27.548 --> 00:03:31.050 where if they don't have at least some older people 86 00:03:31.050 --> 00:03:32.252 they're less likely to win. 87 00:03:32.252 --> 00:03:33.831 All kinds of interesting things. 88 00:03:33.831 --> 00:03:36.171 But eventually what I really narrowed in on was 89 00:03:36.171 --> 00:03:39.450 if you want to compete in the Olympics 90 00:03:39.450 --> 00:03:40.708 and you're beyond the age of 50, 91 00:03:40.708 --> 00:03:42.660 it's all about equestrianism. 92 00:03:42.660 --> 00:03:44.022 That's really what it came down to. 93 00:03:44.022 --> 00:03:46.611 There are a couple of others, for instance, 94 00:03:46.611 --> 00:03:50.400 some of these shooting sports like, archery 95 00:03:50.400 --> 00:03:54.120 and shooting specifically, actually maybe even not archery, 96 00:03:54.120 --> 00:03:55.290 this is a different table. 97 00:03:55.290 --> 00:03:58.703 But long story short, the primary issue was all 98 00:03:58.703 --> 00:04:00.870 about equestrianism. 99 00:04:00.870 --> 00:04:02.910 And so I investigated the data a whole bunch 100 00:04:02.910 --> 00:04:04.230 of different ways. 101 00:04:04.230 --> 00:04:07.830 The end result was the data story that I created 102 00:04:07.830 --> 00:04:08.970 that you can see here. 103 00:04:08.970 --> 00:04:12.360 Are your Olympic dreams over at 50, okay? 104 00:04:12.360 --> 00:04:14.100 And I'll just show you the data story real quick. 105 00:04:14.100 --> 00:04:15.600 And you can find this if you Google 106 00:04:15.600 --> 00:04:17.790 for it or you can see the URL above. 107 00:04:17.790 --> 00:04:19.567 It's a journalistic sort of a story, 108 00:04:19.567 --> 00:04:21.240 "Age is just a number." 109 00:04:21.240 --> 00:04:23.880 The idea is that your best years 110 00:04:23.880 --> 00:04:26.419 in whatever your profession might be, might come later 111 00:04:26.419 --> 00:04:30.140 in life, but not so much for athletes, usually. 112 00:04:30.140 --> 00:04:33.170 And so then I show that distribution 113 00:04:33.170 --> 00:04:36.240 of athletes ages, right? 114 00:04:36.240 --> 00:04:39.720 So very few athletes over the age of 40, 115 00:04:39.720 --> 00:04:42.120 and especially over the age of 50, 116 00:04:42.120 --> 00:04:45.360 as I point out in the story, you'll notice that I stop 117 00:04:45.360 --> 00:04:49.473 at the age of 74 because what I forgot to mention, 118 00:04:50.760 --> 00:04:53.998 these 80-plus-year-olds in the data set, 119 00:04:53.998 --> 00:04:58.530 these were all competitors in the art competition. 120 00:04:58.530 --> 00:04:59.363 Yeah, well look at this one. 121 00:04:59.363 --> 00:05:01.500 Winslow Homer, very famous artist, 122 00:05:01.500 --> 00:05:05.493 competed in the Olympics when he was 96-years-old. 123 00:05:06.360 --> 00:05:09.420 The art competition literally was a competition. 124 00:05:09.420 --> 00:05:11.460 You could win an Olympic medal as a sculptor, 125 00:05:11.460 --> 00:05:12.720 as a painter, et cetera. 126 00:05:12.720 --> 00:05:16.920 But that competition ended in, I think the 1940s, 127 00:05:16.920 --> 00:05:18.360 or maybe the '30s. 128 00:05:18.360 --> 00:05:20.370 And you know, obviously if you're old, 129 00:05:20.370 --> 00:05:21.203 you can compete in that. 130 00:05:21.203 --> 00:05:22.950 So I sort of took that out of the data set 131 00:05:22.950 --> 00:05:24.600 for this data story. 132 00:05:24.600 --> 00:05:26.090 So that's why the ages end at 74. 133 00:05:26.090 --> 00:05:29.682 But yeah, a 74-year-old competed, not as an artist. 134 00:05:29.682 --> 00:05:32.991 Anyways, when we remove the art competition 135 00:05:32.991 --> 00:05:36.210 from the bit, I looked at just a few little pieces 136 00:05:36.210 --> 00:05:37.043 of the data. 137 00:05:37.043 --> 00:05:39.540 The average age by sport making the point 138 00:05:39.540 --> 00:05:41.992 that a slight majority of sports, 139 00:05:41.992 --> 00:05:44.190 you're more likely to win a medal 140 00:05:44.190 --> 00:05:46.260 if you're above average age than 141 00:05:46.260 --> 00:05:49.110 if you're below the average age. 142 00:05:49.110 --> 00:05:53.550 But five of those have had a single 50-plus medalist 143 00:05:53.550 --> 00:05:54.540 since 1980. 144 00:05:54.540 --> 00:05:56.728 So I'm sort of taking that general story 145 00:05:56.728 --> 00:05:59.970 but then narrowing in on it to say yeah, but, right? 146 00:05:59.970 --> 00:06:02.460 In more recent years, only a few of those are 147 00:06:02.460 --> 00:06:03.990 really relevant for the story that we're talking 148 00:06:03.990 --> 00:06:05.340 about here today. 149 00:06:05.340 --> 00:06:06.600 And I looked at the data a few other ways. 150 00:06:06.600 --> 00:06:08.160 I'm not gonna show you this entire data story 151 00:06:08.160 --> 00:06:10.440 but the basic idea here is, 152 00:06:10.440 --> 00:06:12.000 I discovered some patterns, right? 153 00:06:12.000 --> 00:06:15.030 So in winter sports, you're not gonna compete 154 00:06:15.030 --> 00:06:15.900 if you're beyond 50. 155 00:06:15.900 --> 00:06:17.931 It's all about the summer sports. 156 00:06:17.931 --> 00:06:19.989 And like I said, it's really equestrianism 157 00:06:19.989 --> 00:06:21.829 and a couple of others. 158 00:06:21.829 --> 00:06:25.860 Long story short, I also showed 159 00:06:25.860 --> 00:06:28.680 how equestrian is the most geriatric sport, 160 00:06:28.680 --> 00:06:33.680 by far, when you look at the average age among competitors. 161 00:06:34.470 --> 00:06:37.267 And also the idea of all the quote, unquote, 162 00:06:37.267 --> 00:06:41.490 "Geriatric Olympics" delivered or won, 163 00:06:41.490 --> 00:06:43.950 the vast majority of them are in equestrianism 164 00:06:43.950 --> 00:06:46.260 a few in some other sports, and on top of that, 165 00:06:46.260 --> 00:06:51.260 some people win multiple medals, even as older competitors. 166 00:06:51.750 --> 00:06:54.300 I then have a final data story, or part of the data story, 167 00:06:54.300 --> 00:06:55.890 where I give you the ability to read 168 00:06:55.890 --> 00:06:58.794 about each one of these older medal winners 169 00:06:58.794 --> 00:07:01.273 and learn a little bit more about them individually. 170 00:07:01.273 --> 00:07:04.890 Long story short, the true So What came about 171 00:07:04.890 --> 00:07:06.840 when I realized that A, this is a story 172 00:07:06.840 --> 00:07:10.230 about me really, could I compete in the Olympics 173 00:07:10.230 --> 00:07:12.300 being above the age of 50? 174 00:07:12.300 --> 00:07:14.010 What are the interesting little tidbits 175 00:07:14.010 --> 00:07:16.230 in the data that are about that? 176 00:07:16.230 --> 00:07:18.330 And some of the other insights that I found, 177 00:07:18.330 --> 00:07:20.272 such as the fact that team sports, 178 00:07:20.272 --> 00:07:22.410 you're slightly more likely to win a medal 179 00:07:22.410 --> 00:07:24.202 if you have a slightly older average age, 180 00:07:24.202 --> 00:07:28.110 wasn't really the story that I was talking about. 181 00:07:28.110 --> 00:07:30.600 That wasn't really a true So What for my story. 182 00:07:30.600 --> 00:07:31.560 It's interesting. 183 00:07:31.560 --> 00:07:33.780 It's definitely, could be its own story 184 00:07:33.780 --> 00:07:35.904 and probably should be done as a story on its own, 185 00:07:35.904 --> 00:07:38.403 but it wasn't really relevant for my story.