1 00:00:02,093 --> 00:00:05,373 [Instructor] And so I wanna show you how to do an F-test 2 00:00:08,580 --> 00:00:13,580 that tests the significance of a number of variables, 3 00:00:14,730 --> 00:00:16,530 even when, individually, 4 00:00:16,530 --> 00:00:21,530 they don't pass a T-test for significance. 5 00:00:22,260 --> 00:00:27,260 So we'll go to the MLB dataset, 6 00:00:27,630 --> 00:00:31,950 and we're gonna run this regression right here. 7 00:00:31,950 --> 00:00:35,970 So log salary and the regressors are years, games per year, 8 00:00:35,970 --> 00:00:39,693 batting average, home runs per year and RBIs per year. 9 00:00:40,860 --> 00:00:45,860 So we'll go to the data and we run a linear regression, 10 00:00:49,260 --> 00:00:52,590 where L salary 11 00:00:52,590 --> 00:00:57,590 is the dependent and then the independent is years, 12 00:01:04,800 --> 00:01:08,820 games per year, home runs per year, 13 00:01:08,820 --> 00:01:13,803 RBIs per year, and batting average. 14 00:01:16,530 --> 00:01:20,250 We don't need to save or plot or anything like that. 15 00:01:20,250 --> 00:01:21,340 We're just gonna 16 00:01:23,700 --> 00:01:24,533 run it. 17 00:01:24,533 --> 00:01:26,080 And we find that 18 00:01:27,840 --> 00:01:29,103 our three, 19 00:01:31,470 --> 00:01:35,733 oh, oh, it didn't include the batting average. 20 00:01:37,170 --> 00:01:38,446 Hold on. 21 00:01:38,446 --> 00:01:40,680 Let's run that again. 22 00:01:40,680 --> 00:01:44,730 Regression, linear, and then we just need batting average 23 00:01:44,730 --> 00:01:45,633 in there too. 24 00:01:47,040 --> 00:01:48,480 I must not have added it. 25 00:01:48,480 --> 00:01:53,480 So games per year, home runs per year, 26 00:01:53,520 --> 00:01:57,360 years, RBIs per year, batting average. 27 00:01:57,360 --> 00:01:58,650 Let's make sure that those are there. 28 00:01:58,650 --> 00:02:01,080 Years, games per year, batting average home runs per year, 29 00:02:01,080 --> 00:02:03,420 and RBIs per year on log salary. 30 00:02:03,420 --> 00:02:04,253 Yep, good. 31 00:02:04,253 --> 00:02:06,360 So now we go, boom. 32 00:02:06,360 --> 00:02:08,313 And we find that, 33 00:02:13,170 --> 00:02:18,170 again, our three batting stats here are not significant. 34 00:02:20,970 --> 00:02:25,970 And take note of the sum of squared residuals here. 35 00:02:26,340 --> 00:02:27,173 308. 36 00:02:29,160 --> 00:02:32,883 And we're gonna use that soon. 37 00:02:36,360 --> 00:02:39,690 So now we're gonna restrict our model, 38 00:02:39,690 --> 00:02:42,393 and take out those three, 39 00:02:44,010 --> 00:02:46,590 gonna take out home runs per year, 40 00:02:46,590 --> 00:02:49,620 RBIs per year and batting average. 41 00:02:49,620 --> 00:02:53,853 And we're just gonna run it with these two regressors. 42 00:02:55,260 --> 00:03:00,260 And now what we see are some of squared residuals 43 00:03:00,330 --> 00:03:04,680 has gone down as we expect because 44 00:03:04,680 --> 00:03:09,680 taking regressors out makes it less significant, 45 00:03:10,500 --> 00:03:15,480 or makes it, decreases the explanatory power. 46 00:03:15,480 --> 00:03:19,650 So we're gonna look at this number now too, 47 00:03:19,650 --> 00:03:21,693 and we're gonna go to the F-stat. 48 00:03:24,960 --> 00:03:29,043 And note that it's gone from 183 to 198, 49 00:03:30,540 --> 00:03:33,300 and our queue is three. 50 00:03:33,300 --> 00:03:37,200 I'm gonna move it so you can see this a bit better. 51 00:03:37,200 --> 00:03:41,820 And our F-stat is big, it is about eight, 52 00:03:41,820 --> 00:03:43,753 and eight is big. 53 00:03:43,753 --> 00:03:46,263 And as with any other F-stat, 54 00:03:47,580 --> 00:03:48,850 when you have 55 00:03:52,200 --> 00:03:53,520 a big F-stat, 56 00:03:53,520 --> 00:03:57,483 that is grounds to reject our null, 57 00:03:58,710 --> 00:04:02,730 and assume our null is not true, 58 00:04:02,730 --> 00:04:07,030 that these three batting statistics jointly 59 00:04:08,940 --> 00:04:11,130 are all equal to zero. 60 00:04:11,130 --> 00:04:13,920 So again, our null is that these three things 61 00:04:13,920 --> 00:04:15,363 that we took out. 62 00:04:17,370 --> 00:04:18,360 That's wrong. 63 00:04:18,360 --> 00:04:19,353 It's here. 64 00:04:21,060 --> 00:04:23,220 These three batting stats here, 65 00:04:23,220 --> 00:04:26,100 home runs, RBI and batting average. 66 00:04:26,100 --> 00:04:30,390 We make the assumption a null, that they're all zero. 67 00:04:30,390 --> 00:04:33,450 We drop 'em out of the model, we do an F-test, 68 00:04:33,450 --> 00:04:38,450 we get a huge F-stat and well out in the tails, 69 00:04:38,460 --> 00:04:42,150 very unlikely that we would get that if our null was true, 70 00:04:42,150 --> 00:04:46,590 we conclude our null is then not true. 71 00:04:46,590 --> 00:04:48,780 We reject our null and in this case, 72 00:04:48,780 --> 00:04:53,400 we would use this full model and say that, jointly, 73 00:04:53,400 --> 00:04:56,820 home runs per year, RBIs per year and the batting average, 74 00:04:56,820 --> 00:05:01,820 jointly, they matter even though the P-value is not such 75 00:05:03,360 --> 00:05:07,803 that we would reject the null for the full model.