WEBVTT 1 00:00:00.570 --> 00:00:01.403 All right, hey, everybody. 2 00:00:01.403 --> 00:00:06.403 So I'm gonna introduce a tool to you all 3 00:00:07.020 --> 00:00:12.020 which is mind-bendingly amazing, and it's a tool 4 00:00:12.420 --> 00:00:15.450 that most people have never heard of, 5 00:00:15.450 --> 00:00:19.590 don't even know it exists, and yet it's one of the things 6 00:00:19.590 --> 00:00:22.590 you ever didn't learn, okay? 7 00:00:22.590 --> 00:00:26.100 And it's for data cleaning, it's for data transformation, 8 00:00:26.100 --> 00:00:28.980 and it's built into Excel. 9 00:00:28.980 --> 00:00:30.960 It's also built into Power BI. 10 00:00:30.960 --> 00:00:35.960 Now, caveat on that, it's built into Excel on the PC. 11 00:00:36.030 --> 00:00:39.210 There actually is a version of it now on Excel for the Mac, 12 00:00:39.210 --> 00:00:42.750 but it is not nearly as complete. 13 00:00:42.750 --> 00:00:44.010 You can barely do anything in it, 14 00:00:44.010 --> 00:00:47.310 so unfortunately, this is only for Windows users. 15 00:00:47.310 --> 00:00:52.310 So what it is, is if you go into the data tab in Excel, 16 00:00:53.400 --> 00:00:56.790 you have this button over here called get data. 17 00:00:56.790 --> 00:00:58.350 And as you can see, there are different places 18 00:00:58.350 --> 00:01:00.300 I can get data from, from Azure, 19 00:01:00.300 --> 00:01:02.670 from databases, all kinds of things, 20 00:01:02.670 --> 00:01:04.290 but I'm gonna get data from a file, 21 00:01:04.290 --> 00:01:08.970 and in this case it's a CSV, a comma-delimited text file. 22 00:01:08.970 --> 00:01:11.790 So I'm gonna say from that kind of a file, 23 00:01:11.790 --> 00:01:16.050 and it opens up a little dialogue box where I can choose 24 00:01:16.050 --> 00:01:20.190 where to get that file from, and that file lives 25 00:01:20.190 --> 00:01:23.853 on my computer, and I gotta sort of dig around for it, 26 00:01:28.000 --> 00:01:29.250 and so that file's right here. 27 00:01:29.250 --> 00:01:31.800 So I open it up and like I said, 28 00:01:31.800 --> 00:01:34.290 it opens up essentially this other window. 29 00:01:34.290 --> 00:01:37.350 This is the beginning of the Power Query experience, 30 00:01:37.350 --> 00:01:39.540 and as you can see, it tells me, you know, 31 00:01:39.540 --> 00:01:41.940 essentially this is like a formatting thing. 32 00:01:41.940 --> 00:01:44.370 It's using comma-delimited text, 33 00:01:44.370 --> 00:01:47.973 meaning every comma separates each field, 34 00:01:48.900 --> 00:01:50.580 and data type detection. 35 00:01:50.580 --> 00:01:51.750 We'll talk more about that in a second. 36 00:01:51.750 --> 00:01:53.700 I can either load this data and it'll bring it right 37 00:01:53.700 --> 00:01:57.420 into Excel, or I can click on transform data, 38 00:01:57.420 --> 00:01:59.040 and when I click on transform data, 39 00:01:59.040 --> 00:02:04.040 this is where you get the complete Power Query interface. 40 00:02:04.410 --> 00:02:08.643 This is where all the magic happens, so let me explain this. 41 00:02:09.540 --> 00:02:12.960 What this is is a tool, like I said, to transform your data. 42 00:02:12.960 --> 00:02:16.230 We're in the home tab up here in the upper left hand corner. 43 00:02:16.230 --> 00:02:19.290 Again, I can close and load, and this will take this data 44 00:02:19.290 --> 00:02:20.370 and pull it into Excel. 45 00:02:20.370 --> 00:02:23.010 That's great, I can refresh. 46 00:02:23.010 --> 00:02:25.860 If I happen to change that Excel file, 47 00:02:25.860 --> 00:02:28.835 or that text file rather and save it, 48 00:02:28.835 --> 00:02:31.860 I'm not gonna see any updates here until I hit refresh. 49 00:02:31.860 --> 00:02:33.720 Okay, fine, and there's some other stuff happening 50 00:02:33.720 --> 00:02:35.850 over here, which I'm not gonna talk about. 51 00:02:35.850 --> 00:02:38.490 But let's say as an example of data cleaning, 52 00:02:38.490 --> 00:02:41.520 let's say that I decided that I did want to get rid 53 00:02:41.520 --> 00:02:43.500 of this overview column. 54 00:02:43.500 --> 00:02:46.410 I just don't need all that text, it's just annoying. 55 00:02:46.410 --> 00:02:49.050 I can click the column, and I can click on this button 56 00:02:49.050 --> 00:02:50.485 over here, remove columns, 57 00:02:50.485 --> 00:02:54.603 and it's gonna remove that column, and it's gone. 58 00:02:55.470 --> 00:02:59.820 Now, that introduces an amazingly cool thing 59 00:02:59.820 --> 00:03:01.440 about Power Query. 60 00:03:01.440 --> 00:03:03.030 You may notice on the right hand side here, 61 00:03:03.030 --> 00:03:06.240 there's this applied steps box, 62 00:03:06.240 --> 00:03:08.670 and you can see remove columns shows up here. 63 00:03:08.670 --> 00:03:13.263 That's great, and it has a little X next to it, guess what? 64 00:03:14.100 --> 00:03:19.100 If I click that X, it's gonna undo that change. 65 00:03:19.140 --> 00:03:20.790 Now, that's not that interesting, 66 00:03:20.790 --> 00:03:24.660 because a lot of software has undo, but what's amazing 67 00:03:24.660 --> 00:03:28.710 about this applied steps thing is that every step, 68 00:03:28.710 --> 00:03:32.760 every change, every thing I do gets added to that list, 69 00:03:32.760 --> 00:03:35.580 and if I save this file, close it, and come back 70 00:03:35.580 --> 00:03:39.150 three months from now, I can undo those steps. 71 00:03:39.150 --> 00:03:41.340 I can change how those steps are done. 72 00:03:41.340 --> 00:03:44.490 I can even undo steps in the middle of the process, 73 00:03:44.490 --> 00:03:46.350 and not mess up the rest of the process, 74 00:03:46.350 --> 00:03:47.910 although there's a lot of caveats there. 75 00:03:47.910 --> 00:03:49.230 I might mess up the rest of the process, 76 00:03:49.230 --> 00:03:52.050 but I always have access to try to do that. 77 00:03:52.050 --> 00:03:56.310 So we're always saving these steps, it is amazing. 78 00:03:56.310 --> 00:03:59.550 All right, now, I'm gonna keep this column for now. 79 00:03:59.550 --> 00:04:01.410 Let's go and think about some of the things we noticed, 80 00:04:01.410 --> 00:04:05.640 a couple things, number one, in the released year, 81 00:04:05.640 --> 00:04:07.680 and by the way, if I click this little arrow here, 82 00:04:07.680 --> 00:04:08.730 it's like filtering, right? 83 00:04:08.730 --> 00:04:12.300 I can see what's going on here, and if I zoom now, 84 00:04:12.300 --> 00:04:16.950 remember there was this one movie that had PG as a rating. 85 00:04:16.950 --> 00:04:19.410 Now, that movie was "Apollo 13," 86 00:04:19.410 --> 00:04:21.840 and if I was gonna do this the right way, 87 00:04:21.840 --> 00:04:24.150 I would probably go and look up "Apollo 13," 88 00:04:24.150 --> 00:04:26.640 figure out what year it was released, and just fix it 89 00:04:26.640 --> 00:04:29.520 in the source data, or fix it here or whatever. 90 00:04:29.520 --> 00:04:31.440 For today, I'm just gonna filter that out. 91 00:04:31.440 --> 00:04:34.440 I just want to ignore movies that have a weird wrong answer 92 00:04:34.440 --> 00:04:37.560 there, but again, in a real analysis, 93 00:04:37.560 --> 00:04:40.050 you would probably just fix that value. 94 00:04:40.050 --> 00:04:42.420 So I filter the rows, so you can see that over here, 95 00:04:42.420 --> 00:04:44.940 and by the way, there's this little gear over here, 96 00:04:44.940 --> 00:04:48.810 so not only can I hit the X and undo that, remove that step, 97 00:04:48.810 --> 00:04:53.220 if I click the gear, it pops open this little dialogue box 98 00:04:53.220 --> 00:04:57.570 where essentially, I can tweak what that change was, 99 00:04:57.570 --> 00:05:02.570 so I can keep rows where released year does not equal PG. 100 00:05:03.630 --> 00:05:05.700 So I can change what's happening, and even do 101 00:05:05.700 --> 00:05:09.570 like multiple parts, multiple steps to that filtering, 102 00:05:09.570 --> 00:05:11.040 so that's pretty sweet to be realized. 103 00:05:11.040 --> 00:05:13.560 Some of these steps have the little gear, some don't, 104 00:05:13.560 --> 00:05:15.960 so you can just keep your eyes open for that. 105 00:05:15.960 --> 00:05:19.560 Let's say that I noticed something weird about the data, 106 00:05:19.560 --> 00:05:21.545 like I wanted to remove the top five rows 107 00:05:21.545 --> 00:05:22.650 for whatever reason. 108 00:05:22.650 --> 00:05:24.690 Imagine you had a data set that had a bunch of extra stuff 109 00:05:24.690 --> 00:05:26.460 in there and you had to get rid of 'em. 110 00:05:26.460 --> 00:05:29.790 There's a remove rows button, and if I click on that, 111 00:05:29.790 --> 00:05:32.400 there are different choices in here, remove top rows. 112 00:05:32.400 --> 00:05:34.410 Let me click on that, and then I can say, oh yeah, 113 00:05:34.410 --> 00:05:36.900 I wanna remove the top five rows. 114 00:05:36.900 --> 00:05:38.370 If I click okay, it's just gonna remove 115 00:05:38.370 --> 00:05:40.110 the top five, that's good. 116 00:05:40.110 --> 00:05:42.060 I'm not gonna actually do that now. 117 00:05:42.060 --> 00:05:44.310 I have another option in here, I have this run time. 118 00:05:44.310 --> 00:05:45.660 Remember we noticed this one? 119 00:05:45.660 --> 00:05:49.710 183 space MIN, that's not very useful. 120 00:05:49.710 --> 00:05:50.875 This is text, so how can I do numeric calculations 121 00:05:50.875 --> 00:05:54.510 with text? 122 00:05:54.510 --> 00:05:55.590 I can't. 123 00:05:55.590 --> 00:05:57.858 Now, I could fix this a bunch of different ways. 124 00:05:57.858 --> 00:06:00.810 One way is I could split the column, 125 00:06:00.810 --> 00:06:02.670 and this will be familiar for Excel users. 126 00:06:02.670 --> 00:06:07.620 I can say split the column, and I can split by delimiter, 127 00:06:07.620 --> 00:06:10.050 or by the number of characters, or by other things, 128 00:06:10.050 --> 00:06:11.970 but in this case, I'm gonna use a delimiter, 129 00:06:11.970 --> 00:06:14.220 and the delimiter I'm gonna use is a space. 130 00:06:14.220 --> 00:06:16.050 So in other words, take this column, 131 00:06:16.050 --> 00:06:19.980 break it into two columns, splitting on any spaces 132 00:06:19.980 --> 00:06:21.510 that you see in the text, 133 00:06:21.510 --> 00:06:24.690 and you'll see split at the left-most, 134 00:06:24.690 --> 00:06:28.110 or you can also split every time you find that delimiter, 135 00:06:28.110 --> 00:06:30.300 which we'll check out in a moment. 136 00:06:30.300 --> 00:06:31.530 I'm gonna split at the left-most. 137 00:06:31.530 --> 00:06:34.440 I'm gonna say, okay, and yeah, it created a column called 138 00:06:34.440 --> 00:06:36.270 runtime one with just the numbers, 139 00:06:36.270 --> 00:06:39.600 a column called runtime two with MIN, 140 00:06:39.600 --> 00:06:42.060 and now I could just delete this column if I wanted to, 141 00:06:42.060 --> 00:06:43.416 but I'm gonna undo that, 142 00:06:43.416 --> 00:06:47.084 and I'm gonna do this a different way. 143 00:06:47.084 --> 00:06:50.970 Another thing I can do is just replace the values. 144 00:06:50.970 --> 00:06:53.880 See this button over here, replace values? 145 00:06:53.880 --> 00:06:58.710 And what if I find space MIN, and replaced with nothing, 146 00:06:58.710 --> 00:07:01.470 and say okay, and it has the same effect, 147 00:07:01.470 --> 00:07:02.755 although you will notice 148 00:07:02.755 --> 00:07:07.260 this is still actually being treated as text. 149 00:07:07.260 --> 00:07:08.790 How do I know that? 150 00:07:08.790 --> 00:07:12.000 Because it's left aligned numbers. 151 00:07:12.000 --> 00:07:14.670 Let's see if we can find any that it is properly recognizing 152 00:07:14.670 --> 00:07:17.670 as numbers, here you go, numbers are right aligned. 153 00:07:17.670 --> 00:07:20.370 So I can recognize the data type is incorrect, 154 00:07:20.370 --> 00:07:22.290 and by the way, I noticed that the same thing with year, 155 00:07:22.290 --> 00:07:24.000 'cause remember we had that PG in there. 156 00:07:24.000 --> 00:07:25.773 It got confused, okay? 157 00:07:26.940 --> 00:07:29.790 Brings up an interesting point about Power Query, 158 00:07:29.790 --> 00:07:31.530 and there's a little text down here that says it, 159 00:07:31.530 --> 00:07:35.760 column profiling based on top 1,000 rows. 160 00:07:35.760 --> 00:07:37.890 Now, this data set only has 1,000 rows of data, 161 00:07:37.890 --> 00:07:42.090 so it's basing the "profiling" based on all the rows. 162 00:07:42.090 --> 00:07:44.220 But if I had 100,000 rows of data, 163 00:07:44.220 --> 00:07:46.830 it would only be figuring out things like data types 164 00:07:46.830 --> 00:07:49.140 based on the first 1,000 rows, 165 00:07:49.140 --> 00:07:52.230 so sometimes Power Query gets data types wrong. 166 00:07:52.230 --> 00:07:54.450 In this case, it thinks it's text, 167 00:07:54.450 --> 00:07:55.830 and see this little button over here, 168 00:07:55.830 --> 00:07:57.453 and see how it highlights when I roll over it? 169 00:07:57.453 --> 00:07:59.790 ABC, it thinks that this is text. 170 00:07:59.790 --> 00:08:03.060 I can click on this and change it to, in this case, 171 00:08:03.060 --> 00:08:06.323 a year, a whole number, and boom, 172 00:08:06.323 --> 00:08:09.690 it's now a whole number and it fixed it. 173 00:08:09.690 --> 00:08:12.300 You can even see it visually because it's right aligning it. 174 00:08:12.300 --> 00:08:14.370 I'm gonna do the same thing with run time, 175 00:08:14.370 --> 00:08:15.360 whole numbers, okay? 176 00:08:15.360 --> 00:08:17.400 I don't think there are any decimals in there. 177 00:08:17.400 --> 00:08:21.180 Now, we have our genres, and like I mentioned before, 178 00:08:21.180 --> 00:08:24.270 I don't like the fact that it's one column 179 00:08:24.270 --> 00:08:27.390 with comma-delimited values. 180 00:08:27.390 --> 00:08:32.010 I want to split that one by delimiter, in this case commas, 181 00:08:32.010 --> 00:08:34.580 and I want it to happen for each occurrence. 182 00:08:34.580 --> 00:08:37.710 And when I do that, it created three columns, 183 00:08:37.710 --> 00:08:40.500 genre one, two, and three. 184 00:08:40.500 --> 00:08:43.170 Now, I didn't know how many it would create, 185 00:08:43.170 --> 00:08:46.410 but I now have confirmed what I saw visually 186 00:08:46.410 --> 00:08:49.560 that there are, in fact, never more than three genres. 187 00:08:49.560 --> 00:08:53.040 If there were 176 of them, and like if one column 188 00:08:53.040 --> 00:08:57.210 or one value had literally 176 genres separated by columns, 189 00:08:57.210 --> 00:08:59.460 it would've created 176 columns, okay? 190 00:08:59.460 --> 00:09:01.950 So you know, it's kind of neat that you don't have to know 191 00:09:01.950 --> 00:09:03.690 the number to do that. 192 00:09:03.690 --> 00:09:04.950 IMDB rating looks good. 193 00:09:04.950 --> 00:09:07.170 The overview I'm gonna keep for now. 194 00:09:07.170 --> 00:09:10.320 Meta score, I have a bunch of blank values. 195 00:09:10.320 --> 00:09:14.250 Do I wanna filter those out maybe? 196 00:09:14.250 --> 00:09:16.740 I think for now I'm gonna leave those. 197 00:09:16.740 --> 00:09:20.370 Directors and the actors, I'm gonna leave those for now. 198 00:09:20.370 --> 00:09:22.530 Number of votes and gross, 199 00:09:22.530 --> 00:09:24.870 I'm not gonna touch those for now either. 200 00:09:24.870 --> 00:09:27.840 Now, I wanna show you a couple of other things. 201 00:09:27.840 --> 00:09:31.740 Number one, there are other buttons available in here 202 00:09:31.740 --> 00:09:33.117 which are very, very powerful, 203 00:09:33.117 --> 00:09:35.597 and I'm gonna walk through the whole interface. 204 00:09:35.597 --> 00:09:40.500 One of them is this use first row as headers. 205 00:09:40.500 --> 00:09:42.450 Now, we didn't have to do that. 206 00:09:42.450 --> 00:09:45.706 In this particular data set, it's a CSV file, 207 00:09:45.706 --> 00:09:50.493 and the first row in the comma-delimited text file, 208 00:09:52.440 --> 00:09:56.730 in fact, if we look back at it, row number one had 209 00:09:56.730 --> 00:09:59.960 the actual headers, the actual field names. 210 00:09:59.960 --> 00:10:04.960 Now, Power Query was smart enough to actually use those 211 00:10:05.261 --> 00:10:06.720 as the column headers. 212 00:10:06.720 --> 00:10:08.850 It converted them into column headers 213 00:10:08.850 --> 00:10:11.163 rather than as a first row of data. 214 00:10:12.150 --> 00:10:14.820 Sometimes that doesn't happen. 215 00:10:14.820 --> 00:10:17.460 Sometimes Power Query gets confused, 216 00:10:17.460 --> 00:10:20.310 and your column headers will actually read column one, 217 00:10:20.310 --> 00:10:22.740 column two, column three, and your first row 218 00:10:22.740 --> 00:10:26.340 of data will actually have these values down here. 219 00:10:26.340 --> 00:10:29.220 If you ever see that, click on use first row as headers, 220 00:10:29.220 --> 00:10:32.880 and it essentially does what's called promote the headers. 221 00:10:32.880 --> 00:10:34.156 And you can see in fact, in the applied steps, 222 00:10:34.156 --> 00:10:36.330 this happened up here. 223 00:10:36.330 --> 00:10:38.659 It just happened automatically, all right? 224 00:10:38.659 --> 00:10:42.210 I'm gonna talk about merging queries in a second, 225 00:10:42.210 --> 00:10:44.880 and nothing else here we need to talk about at the moment. 226 00:10:44.880 --> 00:10:47.820 There's also the transform tab up here, 227 00:10:47.820 --> 00:10:49.770 where I can do some of those same things I just saw before, 228 00:10:49.770 --> 00:10:51.270 use first row as headers. 229 00:10:51.270 --> 00:10:54.510 I can also do what's called transpose the rows, 230 00:10:54.510 --> 00:10:57.180 literally turn rows into columns and columns into rows, 231 00:10:57.180 --> 00:10:58.890 which sometimes is useful. 232 00:10:58.890 --> 00:11:00.240 Reverse rows, I don't know why 233 00:11:00.240 --> 00:11:02.366 you would need to do that, but you can. 234 00:11:02.366 --> 00:11:05.610 I can also change the data type, so like I clicked 235 00:11:05.610 --> 00:11:07.680 this button up here to change a data type before, 236 00:11:07.680 --> 00:11:10.890 I can change this to something else if I wanted to. 237 00:11:10.890 --> 00:11:14.760 I could also do it by clicking this button right up here. 238 00:11:14.760 --> 00:11:16.230 Same idea, so sometimes you can do things 239 00:11:16.230 --> 00:11:17.820 in multiple different ways, in fact, 240 00:11:17.820 --> 00:11:20.250 a lot of things I can do in Power Query, if I click 241 00:11:20.250 --> 00:11:22.830 on the column, if I right click on it, 242 00:11:22.830 --> 00:11:25.710 or control click maybe, depending on what your computer is, 243 00:11:25.710 --> 00:11:26.880 I also have access to a lot 244 00:11:26.880 --> 00:11:28.280 of different things I can do here. 245 00:11:28.280 --> 00:11:29.400 As you can see, by the way, 246 00:11:29.400 --> 00:11:32.790 I can do things like remove duplicates, remove errors. 247 00:11:32.790 --> 00:11:35.580 Sometimes Power Query will show you it's detected 248 00:11:35.580 --> 00:11:38.280 an error and do things that way, 249 00:11:38.280 --> 00:11:41.550 so always multiple ways to do things. 250 00:11:41.550 --> 00:11:44.460 What else, I can do things like fill values. 251 00:11:44.460 --> 00:11:45.780 So let's, I'm gonna do that for a second, 252 00:11:45.780 --> 00:11:47.400 and then I'm gonna undo it. 253 00:11:47.400 --> 00:11:52.290 Let's say I noticed, you know, I have in these two rows, 254 00:11:53.370 --> 00:11:56.100 sometimes you get Excel data where somebody sets 255 00:11:56.100 --> 00:11:58.110 the first row with a value, 256 00:11:58.110 --> 00:12:01.560 and that's implied that it should be carried down. 257 00:12:01.560 --> 00:12:03.030 In this case, that's not how this one works, 258 00:12:03.030 --> 00:12:04.252 but you know, sometimes you see that. 259 00:12:04.252 --> 00:12:07.230 Essentially, they use like a column as like a header, 260 00:12:07.230 --> 00:12:09.240 and then the next column over has more values in it, 261 00:12:09.240 --> 00:12:11.520 and the header is supposed to apply to all of those. 262 00:12:11.520 --> 00:12:12.743 It's like a formatting thing. 263 00:12:12.743 --> 00:12:14.626 If that was the case here, 264 00:12:14.626 --> 00:12:17.310 I could use this fill down button, 265 00:12:17.310 --> 00:12:19.050 and watch what happens, it fills down. 266 00:12:19.050 --> 00:12:21.960 Any blanks anywhere in the entire data set, 267 00:12:21.960 --> 00:12:24.450 it'll take anything above those blanks. 268 00:12:24.450 --> 00:12:27.750 It'll find it and fill it down just for those blanks, 269 00:12:27.750 --> 00:12:29.550 so definitely not what you wanna do 270 00:12:29.550 --> 00:12:32.010 with this data set today here, 271 00:12:32.010 --> 00:12:34.563 but a really useful thing for certain data sets. 272 00:12:36.570 --> 00:12:40.410 A very, very powerful and important feature built 273 00:12:40.410 --> 00:12:42.150 into Power Query which we are going 274 00:12:42.150 --> 00:12:45.783 to use today is called unpivoting, 275 00:12:47.460 --> 00:12:50.250 so the basic idea here is this. 276 00:12:50.250 --> 00:12:55.250 A lot of data analytics is much easier to do if each row 277 00:12:57.690 --> 00:13:02.690 of data is like a self-contained thing. 278 00:13:04.800 --> 00:13:06.750 One way that it's described is like 279 00:13:06.750 --> 00:13:09.210 it's more transactional data, which makes sense. 280 00:13:09.210 --> 00:13:10.380 It's sort of like a transaction. 281 00:13:10.380 --> 00:13:15.000 So if I was looking at data for a store, 282 00:13:15.000 --> 00:13:17.940 each transaction is someone bought something, 283 00:13:17.940 --> 00:13:19.770 and I want that separated so I can compare 284 00:13:19.770 --> 00:13:22.080 this person's transactions to this person's transactions, 285 00:13:22.080 --> 00:13:24.120 or the transaction for this hour compared 286 00:13:24.120 --> 00:13:27.510 to the transactions for another hour, et cetera. 287 00:13:27.510 --> 00:13:32.510 When we have a row that has multiple things 288 00:13:33.600 --> 00:13:36.330 like our genres, and by the way, 289 00:13:36.330 --> 00:13:41.010 our people, broken into separate columns, 290 00:13:41.010 --> 00:13:45.033 sometimes that makes it very difficult to do analysis. 291 00:13:46.380 --> 00:13:50.100 So what I'm gonna do is I'm gonna do what's called 292 00:13:50.100 --> 00:13:51.840 unpivot those columns. 293 00:13:51.840 --> 00:13:55.020 Essentially, I'm gonna create a row of data 294 00:13:55.020 --> 00:13:57.060 for every one of those, so in other words, 295 00:13:57.060 --> 00:14:01.290 for this movie "Dil Chahta Hai," 296 00:14:01.290 --> 00:14:02.850 I'm sure I mispronounced that. 297 00:14:02.850 --> 00:14:05.300 There are some Bollywood movies in this data set. 298 00:14:06.540 --> 00:14:09.420 It's gonna essentially generate three rows, 299 00:14:09.420 --> 00:14:13.080 one that says this is a comedy, one that says it's drama, 300 00:14:13.080 --> 00:14:15.360 one that says it's a romance. 301 00:14:15.360 --> 00:14:18.330 This will make it easier later to figure out 302 00:14:18.330 --> 00:14:20.970 which genres are associated with better movies. 303 00:14:20.970 --> 00:14:23.340 This isn't always required, 304 00:14:23.340 --> 00:14:26.040 but it can make it a lot easier to do things like actually, 305 00:14:26.040 --> 00:14:28.260 by the way, creating what's called pivot tables. 306 00:14:28.260 --> 00:14:29.940 That's why this is called unpivot. 307 00:14:29.940 --> 00:14:33.240 We're unpivoting the data and making it more transactional. 308 00:14:33.240 --> 00:14:35.160 So all we have to do to do this is select 309 00:14:35.160 --> 00:14:38.790 those three columns, shift click on the first one, 310 00:14:38.790 --> 00:14:41.010 regular click and then shift click for the last one, 311 00:14:41.010 --> 00:14:43.530 and it'll select all the columns in between. 312 00:14:43.530 --> 00:14:46.440 And then I can say unpivot columns, 313 00:14:46.440 --> 00:14:48.870 and it does exactly what I just said it would do. 314 00:14:48.870 --> 00:14:52.260 It creates three rows for this single movie. 315 00:14:52.260 --> 00:14:54.210 All the other data is repeated. 316 00:14:54.210 --> 00:14:55.500 Obviously, the release year is the same, 317 00:14:55.500 --> 00:14:57.870 the run time's the same, et cetera, et cetera, et cetera, 318 00:14:57.870 --> 00:15:01.520 but then each one has a single, where did it go? 319 00:15:01.520 --> 00:15:05.550 Oh, I think it maybe dropped it at the end. 320 00:15:05.550 --> 00:15:09.030 Yeah, a single now genre. 321 00:15:09.030 --> 00:15:12.690 Now you'll notice it has the attribute, in other words, 322 00:15:12.690 --> 00:15:16.380 the column header gets sort of put over here, 323 00:15:16.380 --> 00:15:18.750 and then it has the actual value, 324 00:15:18.750 --> 00:15:20.670 and it calls it attribute and value. 325 00:15:20.670 --> 00:15:24.120 Now, in this case, I don't need to save this column, 326 00:15:24.120 --> 00:15:25.830 it's genre 1, 2, 3, 1, 2, 3, 327 00:15:25.830 --> 00:15:28.290 'cause these numeric things have no bearing. 328 00:15:28.290 --> 00:15:32.280 So I'm just gonna right click on that or control click, 329 00:15:32.280 --> 00:15:34.020 and I'm gonna remove that. 330 00:15:34.020 --> 00:15:36.240 Now, my value, I don't want that called value. 331 00:15:36.240 --> 00:15:39.180 I actually want that called genre. 332 00:15:39.180 --> 00:15:42.360 Okay, so now I have transactional data for my genres. 333 00:15:42.360 --> 00:15:47.177 Now, if I wanted to do the same thing with my people, 334 00:15:48.030 --> 00:15:49.740 I could do the same thing, so I'm gonna do that. 335 00:15:49.740 --> 00:15:52.320 I'm gonna select the director column, 336 00:15:52.320 --> 00:15:55.890 and then I'm going to shift click to select the other, 337 00:15:55.890 --> 00:15:59.820 the actor columns, and once again, I'm gonna unpivot. 338 00:15:59.820 --> 00:16:04.820 And now each one of these has multiple entries, 339 00:16:05.310 --> 00:16:10.310 one for each person, and I have the associated role 340 00:16:12.420 --> 00:16:13.912 for each of those people. 341 00:16:13.912 --> 00:16:18.330 Now, you will notice that now I have a whole bunch 342 00:16:18.330 --> 00:16:21.900 of duplicates for this movie, and this may be a bad thing, 343 00:16:21.900 --> 00:16:24.480 this may be a good thing, depending on the type of analysis 344 00:16:24.480 --> 00:16:25.740 that you're doing. 345 00:16:25.740 --> 00:16:28.110 There are different ways to do this. 346 00:16:28.110 --> 00:16:30.870 I can't get into that in this class, but long story short, 347 00:16:30.870 --> 00:16:34.350 for simplicity's sake, this sort of made sense to me. 348 00:16:34.350 --> 00:16:38.070 In this case, unlike genre, I want to keep this role, 349 00:16:38.070 --> 00:16:40.080 and I'm gonna call it role, 350 00:16:40.080 --> 00:16:43.620 because I wanna separate my directors maybe from my actors. 351 00:16:43.620 --> 00:16:46.290 But I do want to replace the values. 352 00:16:46.290 --> 00:16:47.910 Instead of star 1, 2, 3, 4, 353 00:16:47.910 --> 00:16:50.730 that makes them feel like different types of things, 354 00:16:50.730 --> 00:16:53.130 what I actually want to do is replace values, 355 00:16:53.130 --> 00:16:58.130 and if I find star one, I'm gonna just call that actor. 356 00:16:59.190 --> 00:17:03.240 Okay, so it did it all down the entire data set. 357 00:17:03.240 --> 00:17:07.353 I'm gonna do that again with star two, actor, 358 00:17:08.370 --> 00:17:13.260 and let me shift or copy that so I can just paste later on. 359 00:17:13.260 --> 00:17:14.610 I'm gonna do it two more times. 360 00:17:14.610 --> 00:17:17.730 I'm also gonna replace star three with actor, 361 00:17:17.730 --> 00:17:22.730 and then I'm also gonna replace star four with actor. 362 00:17:24.390 --> 00:17:26.040 So now I just have the different roles 363 00:17:26.040 --> 00:17:27.360 for each one of these people. 364 00:17:27.360 --> 00:17:28.193 Long story short, 365 00:17:28.193 --> 00:17:30.510 the data is a little bit easier to work with. 366 00:17:30.510 --> 00:17:32.610 You know, I probably could have done it 367 00:17:32.610 --> 00:17:34.990 in a slightly different way so I didn't have literally 368 00:17:34.990 --> 00:17:37.980 a dozen rows now for each one of these movies, 369 00:17:37.980 --> 00:17:40.560 but for now, this is good enough. 370 00:17:40.560 --> 00:17:45.560 So we're gonna do one more thing, and in fact, 371 00:17:46.110 --> 00:17:48.143 we might, we're gonna do two more things. 372 00:17:48.990 --> 00:17:51.120 We're gonna do something which brings up 373 00:17:51.120 --> 00:17:54.453 a data literacy question, which is this. 374 00:17:56.700 --> 00:18:00.420 Look at the gross revenues column, and by the way, 375 00:18:00.420 --> 00:18:03.180 you'll notice it actually is using currency, 376 00:18:03.180 --> 00:18:06.330 so it actually has, you know, two decimal places. 377 00:18:06.330 --> 00:18:09.090 It's actually currency, it's treating it as actual dollars, 378 00:18:09.090 --> 00:18:11.340 not just a number. 379 00:18:11.340 --> 00:18:14.070 The difference between currency and just a number, 380 00:18:14.070 --> 00:18:15.510 I don't know if it even matters. 381 00:18:15.510 --> 00:18:17.584 I'm actually gonna convert this back to a whole number, 382 00:18:17.584 --> 00:18:18.417 'cause I don't even wanna look 383 00:18:18.417 --> 00:18:20.220 at those stupid decimal places. 384 00:18:20.220 --> 00:18:24.930 Now put your data literacy hat on for a second, 385 00:18:24.930 --> 00:18:28.503 and think about this gross revenues column. 386 00:18:29.520 --> 00:18:33.330 Do you have any concerns about that gross revenues column 387 00:18:33.330 --> 00:18:35.030 that you think we should clean up? 388 00:18:37.560 --> 00:18:40.830 Now, if this was a live in-person class, I would call 389 00:18:40.830 --> 00:18:42.960 on people and ask them to tell me what they noticed. 390 00:18:42.960 --> 00:18:45.509 Can't do that here, so let me just tell you the answer. 391 00:18:45.509 --> 00:18:47.430 The answer is this. 392 00:18:47.430 --> 00:18:50.730 Let me scroll back over to released year, 393 00:18:50.730 --> 00:18:54.307 and let me sort in ascending order. 394 00:18:54.307 --> 00:18:58.800 "1920," this movie "1920" had gross revenues 395 00:18:58.800 --> 00:19:02.880 of whatever it had, gross revenues, where'd that go? 396 00:19:02.880 --> 00:19:05.100 There it is, well, and that one is null. 397 00:19:05.100 --> 00:19:07.980 This other movie from 1920 had gross revenues 398 00:19:07.980 --> 00:19:12.980 of $5.45 million, over $5 million in 1921 in that case. 399 00:19:16.050 --> 00:19:18.570 That seems like a lot of money for 1921, 400 00:19:18.570 --> 00:19:22.713 but if I compare that to like "Avatar," that made nothing. 401 00:19:24.150 --> 00:19:26.850 That's the problem with this column. 402 00:19:26.850 --> 00:19:31.230 It's the gross revenues in actual dollars. 403 00:19:31.230 --> 00:19:33.930 It's not adjusted for inflation, 404 00:19:33.930 --> 00:19:38.400 so what I wanna do is I want to adjust for inflation. 405 00:19:38.400 --> 00:19:40.920 How am I gonna do that? 406 00:19:40.920 --> 00:19:43.260 It's a two-step process, and this brings up one 407 00:19:43.260 --> 00:19:47.223 of the insanely amazing wonderful things about Power Query. 408 00:19:48.150 --> 00:19:53.150 I can bring in additional data and merge data sets together, 409 00:19:53.760 --> 00:19:57.270 and do all kinds of fancy stuff, so I'm gonna do that. 410 00:19:57.270 --> 00:20:00.630 I'm gonna click up here, I'm on the home tab over here, 411 00:20:00.630 --> 00:20:03.330 up here on the far right, new source. 412 00:20:03.330 --> 00:20:05.040 I'm gonna pull in another file. 413 00:20:05.040 --> 00:20:07.470 I have another CSV file waiting for me, 414 00:20:07.470 --> 00:20:11.460 which is a CPI, consumer price index file 415 00:20:11.460 --> 00:20:13.680 that I downloaded from a government website, 416 00:20:13.680 --> 00:20:16.500 and as you can see here, it has a bunch of data in it. 417 00:20:16.500 --> 00:20:19.140 I'm just gonna load it in, 418 00:20:19.140 --> 00:20:21.840 and you'll notice over here on the left, I have two queries. 419 00:20:21.840 --> 00:20:23.070 These are like my two data sources. 420 00:20:23.070 --> 00:20:25.020 I have the IMDB data source we were just looking at, 421 00:20:25.020 --> 00:20:26.910 and I have the CPI data source. 422 00:20:26.910 --> 00:20:29.700 Now, as you'll notice, this is a good example 423 00:20:29.700 --> 00:20:30.750 of what I've talked about before. 424 00:20:30.750 --> 00:20:34.020 I have a row that is useless, and I have another row 425 00:20:34.020 --> 00:20:35.460 that actually has my column headers, 426 00:20:35.460 --> 00:20:37.290 and then I have these useless column headers up here, 427 00:20:37.290 --> 00:20:41.940 so I wanna remove the first row, remove top rows, 428 00:20:41.940 --> 00:20:45.990 row number one, and I want to promote the headers, 429 00:20:45.990 --> 00:20:48.917 use first row as headers, so now I actually have the year 430 00:20:48.917 --> 00:20:53.917 and then the each whatever other column as it is. 431 00:20:54.060 --> 00:20:56.250 Now, I have monthly CPI values. 432 00:20:56.250 --> 00:20:57.240 I don't need monthly values. 433 00:20:57.240 --> 00:21:00.030 What I really need is just the annual value, 434 00:21:00.030 --> 00:21:02.430 so I'm gonna delete all these extra monthly columns 435 00:21:02.430 --> 00:21:05.250 and get rid of all those, and just keep this average. 436 00:21:05.250 --> 00:21:07.650 I assume that that's the annual average. 437 00:21:07.650 --> 00:21:10.110 So I'm just gonna remove those columns, 438 00:21:10.110 --> 00:21:11.910 and I'm also gonna get rid of these two things. 439 00:21:11.910 --> 00:21:12.810 I don't even know what they are. 440 00:21:12.810 --> 00:21:14.790 I don't care, useless to me. 441 00:21:14.790 --> 00:21:16.770 These are all the only numbers I need, 442 00:21:16.770 --> 00:21:20.100 the year and the CPI value. 443 00:21:20.100 --> 00:21:21.390 Now of course, I'm gonna explain what I'm gonna do 444 00:21:21.390 --> 00:21:22.500 with those in a second. 445 00:21:22.500 --> 00:21:24.530 So I now have this query ready to go, 446 00:21:24.530 --> 00:21:28.350 and I have this query ready to go. 447 00:21:28.350 --> 00:21:31.500 What I wanna do is I wanna merge the CPI data 448 00:21:31.500 --> 00:21:33.856 into this data, essentially create a new column 449 00:21:33.856 --> 00:21:38.130 with that CPI value assigned and essentially matched up 450 00:21:38.130 --> 00:21:40.920 with each year, and then I'll show again. 451 00:21:40.920 --> 00:21:42.000 I'm gonna show you what I do with that. 452 00:21:42.000 --> 00:21:46.590 So I have my IMDB top 1,000 data source. 453 00:21:46.590 --> 00:21:48.930 I'm gonna click on this button up here, merge queries. 454 00:21:48.930 --> 00:21:50.400 You can see there's one called append queries. 455 00:21:50.400 --> 00:21:53.520 If I append, it just adds more rows at the bottom, 456 00:21:53.520 --> 00:21:54.690 but I wanna merge 'em together. 457 00:21:54.690 --> 00:21:56.880 I want this row data to be matched with this row 458 00:21:56.880 --> 00:21:59.310 from that other data set, so that's what merging is. 459 00:21:59.310 --> 00:22:02.220 So I'm gonna click merge, and it says, okay, 460 00:22:02.220 --> 00:22:05.070 so this is what you wanna merge into, okay, got it, 461 00:22:05.070 --> 00:22:06.210 and then I say, yeah, 462 00:22:06.210 --> 00:22:09.690 and what I want to use is the CPI data. 463 00:22:09.690 --> 00:22:13.470 And then I have to tell it which one of these things match? 464 00:22:13.470 --> 00:22:14.400 What am I merging on? 465 00:22:14.400 --> 00:22:18.480 And I'm merging on the released year for this data set, 466 00:22:18.480 --> 00:22:20.790 and the year column for this data set. 467 00:22:20.790 --> 00:22:23.340 And sometimes you get this little warning, privacy levels. 468 00:22:23.340 --> 00:22:25.560 Just click on ignore, save. 469 00:22:25.560 --> 00:22:28.240 If that just comes up, you can just ignore it. 470 00:22:28.240 --> 00:22:30.450 You can see there are different types of joins, 471 00:22:30.450 --> 00:22:32.970 which if you ever learned data, relational data, 472 00:22:32.970 --> 00:22:35.940 SQL, et cetera, you can use those in different ways. 473 00:22:35.940 --> 00:22:38.310 I'm not gonna worry about those, can't explain 'em now. 474 00:22:38.310 --> 00:22:40.020 This is also what's called fuzzy matching, 475 00:22:40.020 --> 00:22:43.650 sort of a smart join, also not useful for us here. 476 00:22:43.650 --> 00:22:45.420 Long story short, I'm merging on year. 477 00:22:45.420 --> 00:22:47.940 I click okay, and watch what happens. 478 00:22:47.940 --> 00:22:50.624 Now, it brings in something called CPI, 479 00:22:50.624 --> 00:22:55.200 and it brings in the whole table, so it looks kind of weird. 480 00:22:55.200 --> 00:22:58.170 There's this little button up here, I can click on that. 481 00:22:58.170 --> 00:23:00.960 Instead of bringing in both the year and the average, 482 00:23:00.960 --> 00:23:02.580 'cause the year is already established, 483 00:23:02.580 --> 00:23:04.890 what I just wanna bring is that average, of that number. 484 00:23:04.890 --> 00:23:06.247 So I click okay and boom, 485 00:23:06.247 --> 00:23:10.920 I have this CPI value for each one of these years. 486 00:23:10.920 --> 00:23:15.920 So the "1920" CPI number happens to be 20, 487 00:23:16.270 --> 00:23:19.320 and again, I'll explain what this all means in a second. 488 00:23:19.320 --> 00:23:21.840 I'm just gonna rename this CPI 489 00:23:21.840 --> 00:23:24.420 so I don't get confused by the labeling. 490 00:23:24.420 --> 00:23:26.700 Now the question becomes, what am I gonna do with it? 491 00:23:26.700 --> 00:23:30.840 All right, here is where the magic happens, more magic, 492 00:23:30.840 --> 00:23:32.310 'cause that was already pretty damn magical. 493 00:23:32.310 --> 00:23:36.420 Okay, so I'm gonna do what's called add a column. 494 00:23:36.420 --> 00:23:38.550 What I'm gonna be doing is I want to convert 495 00:23:38.550 --> 00:23:42.180 those gross revenue dollars into an adjusted gross, 496 00:23:42.180 --> 00:23:45.990 using the CPI, the consumer price index, 497 00:23:45.990 --> 00:23:47.525 and there's a formula for this. 498 00:23:47.525 --> 00:23:49.290 And so what I'm gonna do is I'm gonna add a column, 499 00:23:49.290 --> 00:23:52.980 and I'm gonna use what's called a custom column. 500 00:23:52.980 --> 00:23:57.360 In a custom column, I can essentially write a formula. 501 00:23:57.360 --> 00:24:02.360 And I'm gonna call this adjusted underscore gross, okay? 502 00:24:03.930 --> 00:24:08.930 And the formula is you want to take the gross revenues 503 00:24:09.660 --> 00:24:13.320 for each row, so this is gonna happen for every single row. 504 00:24:13.320 --> 00:24:15.990 Take the gross revenue, so what I wanna do is 505 00:24:15.990 --> 00:24:18.270 I want to multiply that number 506 00:24:18.270 --> 00:24:22.290 by the most recent year's CPI value. 507 00:24:22.290 --> 00:24:25.200 So if you look at the CPI dataset, you'll notice that 508 00:24:25.200 --> 00:24:29.010 for 2021, which is the most recent year in that data set, 509 00:24:29.010 --> 00:24:32.700 the CPI value is 270.97. 510 00:24:32.700 --> 00:24:34.860 So I'm just gonna hard code that for now. 511 00:24:34.860 --> 00:24:37.650 In a real project, I'd probably figure out a way 512 00:24:37.650 --> 00:24:41.370 to actually tie that dynamically to the last row, 513 00:24:41.370 --> 00:24:43.530 blah, blah, blah, but for this purposes, 514 00:24:43.530 --> 00:24:44.829 I'm gonna multiply by that number. 515 00:24:44.829 --> 00:24:47.280 And then the formula says, 516 00:24:47.280 --> 00:24:52.170 and then divide by this CPI value. 517 00:24:52.170 --> 00:24:53.670 So no matter what the gross revenue is, 518 00:24:53.670 --> 00:24:55.590 it multiplies by that most recent, 519 00:24:55.590 --> 00:24:57.210 which is gonna be the largest number, 520 00:24:57.210 --> 00:25:00.920 and then divide by that year's number, and when I do this, 521 00:25:00.920 --> 00:25:04.380 it creates this new column called adjusted gross. 522 00:25:04.380 --> 00:25:07.050 This first movie had a null gross revenue. 523 00:25:07.050 --> 00:25:10.470 That's why it's null here for the adjusted gross. 524 00:25:10.470 --> 00:25:14.760 But this other movie that made $5 million back in 1921 525 00:25:14.760 --> 00:25:18.030 or whatever it was made 82 million 526 00:25:18.030 --> 00:25:20.490 or whatever that number is, you know, overall, 527 00:25:20.490 --> 00:25:22.980 when you adjust for inflation. 528 00:25:22.980 --> 00:25:25.890 This is how you get adjusted for inflation numbers 529 00:25:25.890 --> 00:25:28.140 just by that simple formula. 530 00:25:28.140 --> 00:25:29.460 So you'll notice other things. 531 00:25:29.460 --> 00:25:31.470 I can also do what's called an index column. 532 00:25:31.470 --> 00:25:35.430 If I do an index column, it'll actually numerically create 533 00:25:35.430 --> 00:25:37.950 a column with a number for every single row. 534 00:25:37.950 --> 00:25:40.980 Now, I didn't do that here, and I wouldn't wanna do it now 535 00:25:40.980 --> 00:25:43.410 'cause I wouldn't wanna have 12 rows with the unique values. 536 00:25:43.410 --> 00:25:46.860 I'd probably want to do that earlier in the process. 537 00:25:46.860 --> 00:25:48.690 I'm gonna show you the final cleaned up version 538 00:25:48.690 --> 00:25:51.600 of this data set in a moment when we start looking 539 00:25:51.600 --> 00:25:53.430 at the data to analyze it. 540 00:25:53.430 --> 00:25:56.040 This was about how to work with Power Query, 541 00:25:56.040 --> 00:25:59.520 an incredibly powerful tool. 542 00:25:59.520 --> 00:26:02.070 I strongly recommend you play around with this. 543 00:26:02.070 --> 00:26:05.340 There's other stuff in this tool that is fantastic 544 00:26:05.340 --> 00:26:07.320 and amazing, and by the way, you'll notice, 545 00:26:07.320 --> 00:26:09.480 like I said before, the applied steps. 546 00:26:09.480 --> 00:26:11.670 Look at this long list of applied steps. 547 00:26:11.670 --> 00:26:13.140 Let's say I wanted to go back here 548 00:26:13.140 --> 00:26:15.870 and change these replaced values. 549 00:26:15.870 --> 00:26:20.130 Remember I replaced the word star one with actor? 550 00:26:20.130 --> 00:26:23.730 What if I change my mind, and I want to replace star one 551 00:26:23.730 --> 00:26:27.690 with smiley face, okay, that word? 552 00:26:27.690 --> 00:26:30.180 I can do that, and it's gonna do it, 553 00:26:30.180 --> 00:26:31.890 but every single other step that I took 554 00:26:31.890 --> 00:26:34.650 after that still exists, so obviously, 555 00:26:34.650 --> 00:26:35.880 I don't want to actually do that. 556 00:26:35.880 --> 00:26:38.790 Lemme revert to what I had there in the first place, 557 00:26:38.790 --> 00:26:40.020 but you get the point. 558 00:26:40.020 --> 00:26:42.480 You can really change things, and by the way, 559 00:26:42.480 --> 00:26:44.580 if I try to to replace this step, 560 00:26:44.580 --> 00:26:46.027 it's gonna give me a warning and say, 561 00:26:46.027 --> 00:26:46.890 "Are you sure you wanna do that? 562 00:26:46.890 --> 00:26:49.080 You might break something and happen later." 563 00:26:49.080 --> 00:26:52.260 I can get away with doing that in this particular type 564 00:26:52.260 --> 00:26:55.350 of a step, but sometimes you gotta be careful about that. 565 00:26:55.350 --> 00:26:57.570 One other note, on the home tab, 566 00:26:57.570 --> 00:27:00.660 we had this advanced editor over here. 567 00:27:00.660 --> 00:27:02.700 Watch what happens if I click this open. 568 00:27:02.700 --> 00:27:07.700 What this is is code, but it's code for all those steps. 569 00:27:09.630 --> 00:27:12.900 In other words, you can do all those steps by writing code, 570 00:27:12.900 --> 00:27:15.270 or let's say down the road, I want to do, 571 00:27:15.270 --> 00:27:19.320 replace all the times I do replace this with the word actor. 572 00:27:19.320 --> 00:27:21.900 If I just wanted to change that to smiley face, 573 00:27:21.900 --> 00:27:25.170 I could just copy, paste, paste, paste in here, hit save, 574 00:27:25.170 --> 00:27:27.780 and rerun it, and it'll do it. 575 00:27:27.780 --> 00:27:30.330 So sometimes it's faster to work with the code 576 00:27:30.330 --> 00:27:32.940 than to work with button clicking. 577 00:27:32.940 --> 00:27:34.980 Long story short, Power Query, 578 00:27:34.980 --> 00:27:39.980 an amazingly fantastic and useful tool. 579 00:27:40.140 --> 00:27:41.670 Use it, learn it, use it. 580 00:27:41.670 --> 00:27:43.110 I promise you you're gonna love it. 581 00:27:43.110 --> 00:27:45.000 I'm gonna click close and load now, 582 00:27:45.000 --> 00:27:50.000 and you'll see it loads the data into Excel. 583 00:27:50.280 --> 00:27:51.930 So I have the CPI data set, 584 00:27:51.930 --> 00:27:53.340 'cause that was one of the existing queries. 585 00:27:53.340 --> 00:27:56.070 I have the IMDB top 1,000 data set. 586 00:27:56.070 --> 00:27:58.470 That's one of the queries, and I also have a new sheet 587 00:27:58.470 --> 00:28:00.235 that I can start to do other stuff in. 588 00:28:00.235 --> 00:28:03.510 So that's really, really powerful. 589 00:28:03.510 --> 00:28:08.510 Now, you may then ask, "Okay, well, now what?" 590 00:28:08.580 --> 00:28:12.180 If I need to do this again, or load a new data set, 591 00:28:12.180 --> 00:28:14.160 or just go back into the changes I just, 592 00:28:14.160 --> 00:28:16.260 oh, I made a mistake, lemme fix it, 593 00:28:16.260 --> 00:28:18.572 I can just under here queries and connections, 594 00:28:18.572 --> 00:28:22.200 double click into it, and boom, Power Query comes back up. 595 00:28:22.200 --> 00:28:24.870 Like I said, all those applied steps saved. 596 00:28:24.870 --> 00:28:28.470 I can go back in here and tweak, and change, and update, 597 00:28:28.470 --> 00:28:33.470 delete, all to my heart's content, and this lasts forever. 598 00:28:33.570 --> 00:28:36.603 So it's not like undo, it's much, much, much more powerful. 599 00:28:37.802 --> 00:28:40.560 So that is data cleaning. 600 00:28:40.560 --> 00:28:43.413 We're gonna talk next about data analytics.