1 00:00:02,830 --> 00:00:05,240 - [Instructor] Hello, and welcome to the video lecture 2 00:00:05,240 --> 00:00:09,650 on multiple regression analysis and estimation. 3 00:00:09,650 --> 00:00:11,400 So we're gonna be looking at 4 00:00:12,310 --> 00:00:15,430 how to do a linear regression 5 00:00:15,430 --> 00:00:19,230 with more than one regressor 6 00:00:19,230 --> 00:00:24,230 and look at some of the properties of the model, 7 00:00:25,610 --> 00:00:29,180 what makes a good model, some of the assumptions 8 00:00:29,180 --> 00:00:33,070 that make for an unbiased estimator, 9 00:00:33,070 --> 00:00:37,310 much like we did last time, but now adding more regressors. 10 00:00:37,310 --> 00:00:39,830 And we're also gonna think about 11 00:00:41,970 --> 00:00:43,360 how to understand better 12 00:00:43,360 --> 00:00:46,143 what makes for a more efficient estimator. 13 00:00:49,750 --> 00:00:53,290 We'll start with the K=2, the 2 regressor, 14 00:00:53,290 --> 00:00:56,290 and then look at the general form 15 00:00:56,290 --> 00:01:00,590 where there's K regressors, look at the residuals, 16 00:01:00,590 --> 00:01:03,423 how we derive the OLS estimator, 17 00:01:05,260 --> 00:01:08,930 look at the ceteris paribus assumption, 18 00:01:08,930 --> 00:01:11,750 goodness of fit, the degrees of freedoms, 19 00:01:11,750 --> 00:01:14,763 and then again, looking at those assumptions, 20 00:01:15,860 --> 00:01:18,080 which, if they all hold, 21 00:01:18,080 --> 00:01:23,080 that we can say that OLS is the best unbiased estimator, 22 00:01:24,070 --> 00:01:28,610 and then last, looking at 23 00:01:28,610 --> 00:01:31,550 how do we know what the right model is? 24 00:01:31,550 --> 00:01:36,550 What happens if we add a irrelevant regressor, 25 00:01:37,510 --> 00:01:41,207 and what happens if we miss a relevant regressor? 26 00:01:43,790 --> 00:01:46,750 So here is the homework. 27 00:01:46,750 --> 00:01:50,363 So I want you to think about what is multicollinearity, 28 00:01:51,250 --> 00:01:55,513 what's the consequence and what's perfect collinearity, 29 00:01:56,410 --> 00:01:59,860 what happens if you omit a variable, 30 00:01:59,860 --> 00:02:02,370 and when doesn't it matter? 31 00:02:02,370 --> 00:02:07,030 Three, I want you to unpack the three factors 32 00:02:07,030 --> 00:02:11,580 of the OLS estimator's variance. 33 00:02:11,580 --> 00:02:14,420 What drives the variance in this estimator? 34 00:02:14,420 --> 00:02:16,559 And then there are also a couple 35 00:02:16,559 --> 00:02:19,893 of computer homework problems. 36 00:02:21,680 --> 00:02:25,670 Last time, we looked at the one regressor model. 37 00:02:25,670 --> 00:02:27,190 Now we're gonna look at the two. 38 00:02:27,190 --> 00:02:29,160 So now we're looking at a model 39 00:02:29,160 --> 00:02:32,080 where on the left-hand side is wage, 40 00:02:32,080 --> 00:02:34,090 so what drives whether or not somebody 41 00:02:34,090 --> 00:02:36,480 has a low or a high wage. 42 00:02:36,480 --> 00:02:39,280 And we're looking at two regressors, 43 00:02:39,280 --> 00:02:42,500 education and experience. 44 00:02:42,500 --> 00:02:45,210 So we're really interested mostly in education. 45 00:02:45,210 --> 00:02:47,660 So basically does education pay for itself? 46 00:02:47,660 --> 00:02:49,813 What's the returns to education? 47 00:02:50,935 --> 00:02:54,593 And may also be interested in experience, 48 00:02:56,050 --> 00:03:01,050 but we know that if we omit experience 49 00:03:01,080 --> 00:03:04,780 that it goes into the error term. 50 00:03:04,780 --> 00:03:09,400 And it is very likely that if we asked a survey 51 00:03:09,400 --> 00:03:13,080 that education and experience 52 00:03:13,080 --> 00:03:15,413 may be related, 53 00:03:17,580 --> 00:03:20,980 that they are correlated. 54 00:03:20,980 --> 00:03:24,730 For example, the more education that you have, 55 00:03:24,730 --> 00:03:26,440 maybe you're spending more, 56 00:03:26,440 --> 00:03:28,260 spent more of your life on education, 57 00:03:28,260 --> 00:03:31,610 so you have less experience. 58 00:03:31,610 --> 00:03:33,560 That may be one way of thinking about it, 59 00:03:33,560 --> 00:03:36,500 but you can think they're almost certainly correlated. 60 00:03:36,500 --> 00:03:41,500 So if they are, and we forget to put in experience, 61 00:03:42,050 --> 00:03:46,023 then experience, we know, is in the error term, 62 00:03:47,772 --> 00:03:49,623 and then, almost certainly, 63 00:03:51,870 --> 00:03:54,310 our assumption of the error term 64 00:03:54,310 --> 00:03:57,800 and the regressors not being correlated is violated. 65 00:03:57,800 --> 00:04:01,373 And thus, we will almost certainly have a biased estimator. 66 00:04:05,350 --> 00:04:06,733 As another example, 67 00:04:07,570 --> 00:04:11,290 we might wanna look at a model where the test score, 68 00:04:11,290 --> 00:04:15,400 so what is the test score of some school 69 00:04:15,400 --> 00:04:19,480 based on how much that school district 70 00:04:21,710 --> 00:04:23,670 spends on schools, their expenditure, 71 00:04:23,670 --> 00:04:26,090 and the average household income. 72 00:04:26,090 --> 00:04:28,260 So if we only, so we're really interested, 73 00:04:28,260 --> 00:04:30,060 again, in expenditure. 74 00:04:30,060 --> 00:04:33,870 Does higher expenditure lead to higher test scores? 75 00:04:33,870 --> 00:04:35,990 That was, I would say, what we would think, 76 00:04:35,990 --> 00:04:37,690 and maybe even what we would hope, 77 00:04:39,510 --> 00:04:43,210 but if we forget to put in average income, 78 00:04:43,210 --> 00:04:45,023 almost certainly, 79 00:04:48,120 --> 00:04:51,610 expenditure and income are correlated. 80 00:04:51,610 --> 00:04:55,320 So again, if we forget to put it into the model, 81 00:04:55,320 --> 00:04:59,240 then we're almost certainly going to get a biased estimate 82 00:04:59,240 --> 00:05:03,373 of the effect on expenditure. 83 00:05:09,900 --> 00:05:13,150 So in each case, the beta-1, 84 00:05:13,150 --> 00:05:15,710 the one that we're most interested in, 85 00:05:15,710 --> 00:05:18,440 is a measure of 86 00:05:18,440 --> 00:05:22,560 if we increase X1 by one unit, 87 00:05:22,560 --> 00:05:26,493 how much does the Y increase? 88 00:05:27,410 --> 00:05:32,260 So basically we want to include all the relevant regressors, 89 00:05:32,260 --> 00:05:33,280 so we can account for them, 90 00:05:33,280 --> 00:05:35,580 so they don't end up in the error term, 91 00:05:35,580 --> 00:05:39,550 so we don't have a biased estimator. 92 00:05:39,550 --> 00:05:41,490 We want to account for everything, 93 00:05:41,490 --> 00:05:43,190 so we can make a good case 94 00:05:43,190 --> 00:05:48,143 that the estimator for our beta-1 is unbiased. 95 00:05:52,450 --> 00:05:56,450 Remember back that an absolutely essential assumption 96 00:05:56,450 --> 00:05:59,040 is that our error term is uncorrelated 97 00:05:59,040 --> 00:06:00,780 with any of our regressors. 98 00:06:00,780 --> 00:06:02,620 So now that we have two regressors, 99 00:06:02,620 --> 00:06:05,853 it must be uncorrelated with both. 100 00:06:08,980 --> 00:06:12,220 Thinking ahead to have an unbiased estimator, 101 00:06:12,220 --> 00:06:15,520 that the error term must be uncorrelated 102 00:06:15,520 --> 00:06:18,710 with both regressors. 103 00:06:18,710 --> 00:06:21,500 So that would be 104 00:06:21,500 --> 00:06:24,630 that in the population 105 00:06:24,630 --> 00:06:28,940 that the average residual 106 00:06:28,940 --> 00:06:31,800 for any individual, so the expected value, 107 00:06:31,800 --> 00:06:33,110 would be zero. 108 00:06:33,110 --> 00:06:36,620 And that would hold true for any value of X1 or X2. 109 00:06:36,620 --> 00:06:40,970 So no matter what the respondents say on the survey, 110 00:06:40,970 --> 00:06:43,840 no matter what their X1 and X2 is, 111 00:06:43,840 --> 00:06:48,193 the expected value of the residual is zero. 112 00:06:49,820 --> 00:06:54,820 So, and remember that, that that's important 113 00:06:54,830 --> 00:06:58,303 because if that is not true, 114 00:07:00,398 --> 00:07:03,370 that if dU/dX1 115 00:07:03,370 --> 00:07:05,380 does not equal zero, 116 00:07:05,380 --> 00:07:09,070 then we can't tell as we change X1 117 00:07:09,070 --> 00:07:12,350 whether the change in the observed Y 118 00:07:12,350 --> 00:07:17,263 is due to a change in Y-hat or a change in the error term. 119 00:07:23,130 --> 00:07:26,440 Much the same holds when we look at the general case, 120 00:07:26,440 --> 00:07:28,030 so K regressors. 121 00:07:28,030 --> 00:07:31,270 So in many cases, we're going to have more than one, 122 00:07:31,270 --> 00:07:35,920 more than two, but a good number of regressors. 123 00:07:35,920 --> 00:07:39,590 So to think about the original example, 124 00:07:39,590 --> 00:07:42,420 that wages are probably affected, 125 00:07:42,420 --> 00:07:45,810 not just by education experience, 126 00:07:45,810 --> 00:07:50,810 by training, by ability, by all kinds of other things. 127 00:07:51,050 --> 00:07:52,760 Test scores, in the same way, 128 00:07:52,760 --> 00:07:56,903 have many other factors that would affect them. 129 00:07:58,260 --> 00:08:02,700 When we think of the example of local food expenditures, 130 00:08:02,700 --> 00:08:06,930 it wouldn't just be income, but many other factors, 131 00:08:06,930 --> 00:08:11,710 preferences in where you live and household size 132 00:08:11,710 --> 00:08:13,460 and all kinds of things like that. 133 00:08:13,460 --> 00:08:15,380 So you can probably think of other examples 134 00:08:15,380 --> 00:08:18,913 of where many Xs might affect our Y. 135 00:08:22,890 --> 00:08:25,360 This equation at the top is the general form, 136 00:08:25,360 --> 00:08:27,323 where we have K regressors. 137 00:08:28,420 --> 00:08:31,320 And so when we run it through our software, 138 00:08:31,320 --> 00:08:34,350 there's K plus one parameters, 139 00:08:34,350 --> 00:08:37,910 beta-0, beta-1, through beta-k. 140 00:08:37,910 --> 00:08:40,550 Again, beta-0 is the intercept. 141 00:08:40,550 --> 00:08:44,520 And remember that we sort of slide it up and down 142 00:08:44,520 --> 00:08:48,950 so that the expected value of u equals zero. 143 00:08:48,950 --> 00:08:52,150 And in some cases, it could be interpreted 144 00:08:52,150 --> 00:08:56,800 as the value of X, or the expected value of X, 145 00:08:56,800 --> 00:09:01,470 if all the other, if X, 146 00:09:01,470 --> 00:09:03,030 if all the Xs equals zero, 147 00:09:03,030 --> 00:09:05,460 or if all the betas equals zero. 148 00:09:05,460 --> 00:09:08,870 And in many cases, we think of beta-1 to beta-k 149 00:09:08,870 --> 00:09:11,330 are called the slope parameters 150 00:09:11,330 --> 00:09:13,933 'cause they measure the sort of change, 151 00:09:15,190 --> 00:09:18,080 the slope of the line for that regressor, 152 00:09:18,080 --> 00:09:22,203 and u is the disturbance term, as always. 153 00:09:28,120 --> 00:09:30,550 So once we get some data, 154 00:09:30,550 --> 00:09:35,550 and we run it through a software package, 155 00:09:35,650 --> 00:09:40,450 like SPSS, we get a number of things back. 156 00:09:40,450 --> 00:09:43,090 One is we get estimates, these beta-hats. 157 00:09:43,090 --> 00:09:45,550 So we get K plus 1 beta-hats. 158 00:09:45,550 --> 00:09:49,870 And then if we take everybody's X and plug it in 159 00:09:49,870 --> 00:09:54,373 and multiply each of these, 160 00:09:55,540 --> 00:09:58,920 each of their Xs times the beta-hat 161 00:09:58,920 --> 00:10:01,490 and get them all, add them all up, 162 00:10:01,490 --> 00:10:04,440 you'll get the Y-hat for that individual. 163 00:10:04,440 --> 00:10:09,020 So the forecast of what we would say 164 00:10:09,020 --> 00:10:11,123 or what would be our best guess, 165 00:10:12,002 --> 00:10:15,930 that that individual who answered X in that way, 166 00:10:15,930 --> 00:10:19,103 this is what the predicted value of their Y is. 167 00:10:20,840 --> 00:10:24,480 And note that when we put hats on things, 168 00:10:24,480 --> 00:10:26,867 that that's always the estimate. 169 00:10:26,867 --> 00:10:29,400 That's just how we sort of denote it. 170 00:10:29,400 --> 00:10:34,310 And if there are N observations, 171 00:10:34,310 --> 00:10:35,650 so we do a survey, 172 00:10:35,650 --> 00:10:40,410 and we get N observations, 173 00:10:40,410 --> 00:10:43,060 again, what OLS does 174 00:10:43,060 --> 00:10:47,160 is it minimizes the sum of squared residuals. 175 00:10:47,160 --> 00:10:51,020 So everybody has a ui-hat, 176 00:10:51,020 --> 00:10:54,170 and that's the difference between their Y-hat 177 00:10:54,170 --> 00:10:58,700 and their, what they actually said Y on the survey. 178 00:10:58,700 --> 00:11:00,900 So we square those and add them up. 179 00:11:00,900 --> 00:11:04,610 And that is 180 00:11:04,610 --> 00:11:07,363 how we get these estimates. 181 00:11:13,100 --> 00:11:16,010 We can have a sample regression line. 182 00:11:16,010 --> 00:11:19,423 So that is everybody's Y-hat, 183 00:11:20,600 --> 00:11:24,580 which we get by taking everybody's beta-hat, 184 00:11:24,580 --> 00:11:28,670 or everybody's X, and multiplying it by the beta-hat, 185 00:11:28,670 --> 00:11:29,800 as we see. 186 00:11:29,800 --> 00:11:31,860 Note that everybody's Y-hat, 187 00:11:31,860 --> 00:11:35,720 every Yi-hat is on the regression line. 188 00:11:35,720 --> 00:11:36,740 Why is that? 189 00:11:36,740 --> 00:11:39,913 So think about that, and we'll discuss it in class. 190 00:11:46,610 --> 00:11:51,550 So we interpret these beta-hats as the partial effect 191 00:11:51,550 --> 00:11:56,550 of a one-unit change of that Xi on Y. 192 00:11:56,720 --> 00:12:01,160 So as we change that X by a unit, 193 00:12:01,160 --> 00:12:06,160 that beta-hat denotes the expected change in Y. 194 00:12:06,480 --> 00:12:10,140 So, if we go and change every X 195 00:12:10,140 --> 00:12:11,290 by some amount of units 196 00:12:14,250 --> 00:12:16,530 and multiply them by the beta-hat, 197 00:12:16,530 --> 00:12:20,653 it gets the expected change in Y-hat. 198 00:12:21,610 --> 00:12:24,300 Or you can have, 199 00:12:24,300 --> 00:12:27,760 hold all Xs constant except one 200 00:12:27,760 --> 00:12:32,330 and only change one X, and then beta-hat for that X 201 00:12:32,330 --> 00:12:36,443 would be the change in Y just for changing that one. 202 00:12:41,800 --> 00:12:45,870 And this is sort of the magic of regression, 203 00:12:45,870 --> 00:12:50,620 that it allows us to really isolate the effect 204 00:12:50,620 --> 00:12:54,193 of changing one X while holding all else the same. 205 00:12:55,770 --> 00:12:59,860 So we don't have to go 206 00:12:59,860 --> 00:13:02,723 and collect data where, 207 00:13:04,520 --> 00:13:07,660 for the first many Xs, 208 00:13:07,660 --> 00:13:10,770 everybody answers it the same way. 209 00:13:10,770 --> 00:13:13,800 And then only on, say, the fourth or fifth question, 210 00:13:13,800 --> 00:13:18,730 do they change it, that we can collect 211 00:13:18,730 --> 00:13:23,250 where lots and lots of, there are a lot of answers. 212 00:13:23,250 --> 00:13:26,333 Basically, everybody answers it slightly differently, 213 00:13:28,680 --> 00:13:31,640 but the magic of a regression 214 00:13:31,640 --> 00:13:35,040 is we can still isolate, holding all else equal, 215 00:13:35,040 --> 00:13:38,280 what is the change by just changing one X 216 00:13:38,280 --> 00:13:40,033 and everything else stays the same? 217 00:13:42,520 --> 00:13:47,100 We can also measure the effect of changing a lot of Xs 218 00:13:47,100 --> 00:13:49,290 or even all of them. 219 00:13:49,290 --> 00:13:54,083 So all we have to do is sort of plug it, 220 00:13:55,230 --> 00:14:00,203 plug these change in X, into all of these equations, 221 00:14:01,450 --> 00:14:06,270 into the equation, multiply by the various hats, 222 00:14:07,270 --> 00:14:10,063 add them up, and there you go. 223 00:14:11,510 --> 00:14:14,850 Or, you could change by one unit, 224 00:14:14,850 --> 00:14:19,770 or you can change by basically any unit. 225 00:14:19,770 --> 00:14:23,670 If you do change every X by one unit, 226 00:14:23,670 --> 00:14:26,970 the change in Y-hat will just be the sum 227 00:14:26,970 --> 00:14:29,463 of the various beta-hats. 228 00:14:30,880 --> 00:14:34,103 So hopefully, mathematically, all of that makes sense. 229 00:14:38,930 --> 00:14:41,240 Remember that when we run a regression 230 00:14:42,350 --> 00:14:46,460 that every individual 231 00:14:46,460 --> 00:14:50,330 has a Y-hat, the predicted value, 232 00:14:50,330 --> 00:14:55,170 and a u-hat, the value of their residual. 233 00:14:55,170 --> 00:14:59,640 So where do they fall on the regression line? 234 00:14:59,640 --> 00:15:03,220 And then that u-hat measures the difference 235 00:15:03,220 --> 00:15:05,960 between what they actually said 236 00:15:05,960 --> 00:15:08,790 and what we would've predicted they said. 237 00:15:08,790 --> 00:15:13,303 And again, the i is for each individual in the sample. 238 00:15:18,120 --> 00:15:21,280 So you get the Y-hat for each individual 239 00:15:21,280 --> 00:15:26,190 by plugging their Xs in to the model, again, 240 00:15:26,190 --> 00:15:30,720 multiplying them by the beta-hats 241 00:15:30,720 --> 00:15:34,330 and coming up with the Y-hat. 242 00:15:34,330 --> 00:15:38,120 And then we also, everybody has a u-hat. 243 00:15:38,120 --> 00:15:40,700 And we will learn down the road 244 00:15:40,700 --> 00:15:43,540 that we can save them both in SPSS. 245 00:15:43,540 --> 00:15:47,423 So when we go into SPSS, I will show you how to do that. 246 00:15:51,410 --> 00:15:54,070 Here are a number of mathematical 247 00:15:54,070 --> 00:15:57,670 and statistical properties of the residual. 248 00:15:57,670 --> 00:16:02,417 The sample average of ui-hat equals zero 249 00:16:04,370 --> 00:16:06,310 because the mean is zero. 250 00:16:06,310 --> 00:16:10,650 And so the mean of Y, the observed values, 251 00:16:10,650 --> 00:16:12,860 equals the mean of Y-hat. 252 00:16:12,860 --> 00:16:16,543 That Y-bar-hat equals Y-bar. 253 00:16:19,010 --> 00:16:23,870 the sample covariance between each Xk and u 254 00:16:23,870 --> 00:16:25,143 is zero. 255 00:16:26,870 --> 00:16:31,870 And the point of the mean observation, 256 00:16:32,230 --> 00:16:37,100 so the mean of Y, the mean of X1, 257 00:16:37,100 --> 00:16:42,100 all the way through Xk, always lies on the regression line. 258 00:16:50,320 --> 00:16:54,320 An important thing to note is that adding regressors 259 00:16:54,320 --> 00:16:59,180 almost always changes the value of your beta-hats. 260 00:16:59,180 --> 00:17:01,400 So when you go from beta, 261 00:17:04,820 --> 00:17:09,820 from one regressor to two, so from k=1 to k=2, 262 00:17:11,280 --> 00:17:15,350 the value of your beta-1-hat will change. 263 00:17:15,350 --> 00:17:17,790 In the first model, you only have X1, 264 00:17:17,790 --> 00:17:20,873 and then in another model, you add another X, X2, 265 00:17:21,860 --> 00:17:25,280 there will only be two cases 266 00:17:25,280 --> 00:17:29,190 where the value of beta-1-hat does not change. 267 00:17:29,190 --> 00:17:34,190 First is that if the beta-2-hat equals zero, 268 00:17:34,730 --> 00:17:38,103 so X2 has no effect on Y. 269 00:17:38,980 --> 00:17:43,980 And the other is if X1 and X2 are uncorrelated. 270 00:17:44,350 --> 00:17:47,670 So if X2 is uncorrelated 271 00:17:47,670 --> 00:17:50,440 with Y or with X1, 272 00:17:50,440 --> 00:17:53,460 those are the only two cases where adding X2 273 00:17:53,460 --> 00:17:57,933 will not change the value of our beta-1-hat. 274 00:18:02,480 --> 00:18:04,430 So think about it in this way. 275 00:18:04,430 --> 00:18:05,887 So we run this Y 276 00:18:09,540 --> 00:18:13,040 with two Xs, and then we run it again 277 00:18:13,040 --> 00:18:14,923 with only one X. 278 00:18:17,010 --> 00:18:20,343 Now, suppose that this A1, 279 00:18:22,020 --> 00:18:26,740 which is the coefficient 280 00:18:26,740 --> 00:18:29,780 in our second model, when we didn't include X2, 281 00:18:29,780 --> 00:18:34,550 we could write it as beta-1 plus beta-2 times D, 282 00:18:34,550 --> 00:18:37,890 where D is the slope coefficient 283 00:18:37,890 --> 00:18:39,280 of if you had 284 00:18:41,760 --> 00:18:45,070 regressed X2 on X1. 285 00:18:45,070 --> 00:18:46,210 They will be the same. 286 00:18:46,210 --> 00:18:50,180 So this A1 or A2 will be the same, 287 00:18:50,180 --> 00:18:54,373 only if one of these two things is true, 288 00:18:55,370 --> 00:18:58,510 either B2 here is zero, 289 00:18:58,510 --> 00:19:02,060 so X2 has no effect on Y, 290 00:19:02,060 --> 00:19:05,880 or if X1 and X2 are uncorrelated, 291 00:19:05,880 --> 00:19:08,610 so that if this D is zero. 292 00:19:08,610 --> 00:19:13,610 And I am going to show you in class what this looks like, 293 00:19:14,060 --> 00:19:16,423 kind of drawing a Venn diagram. 294 00:19:22,500 --> 00:19:25,600 And in the general case of K regressors, 295 00:19:32,020 --> 00:19:34,793 that when you add regressors, 296 00:19:35,640 --> 00:19:40,640 usually it's going to change the value of your beta-1. 297 00:19:40,750 --> 00:19:44,390 So the only time that this would not be true 298 00:19:44,390 --> 00:19:47,800 is if all the betas equaled zero, 299 00:19:47,800 --> 00:19:52,250 so none of our regressors have any effect on Y, 300 00:19:52,250 --> 00:19:56,640 or if X1 is uncorrelated with every other X, 301 00:19:56,640 --> 00:19:57,727 with X2,...,Xk. 302 00:19:59,610 --> 00:20:04,610 Both of these would be very rare instances, almost always. 303 00:20:04,650 --> 00:20:09,650 Y and X2 are gonna have at least some correlation, 304 00:20:09,680 --> 00:20:14,680 or X1 will have some correlation 305 00:20:15,510 --> 00:20:20,450 with one or more, probably all of our other Xs, 306 00:20:20,450 --> 00:20:21,810 X2 through Xk. 307 00:20:21,810 --> 00:20:26,160 So the bottom line here is adding or subtracting regressors 308 00:20:26,160 --> 00:20:29,880 almost always changes the value of every beta. 309 00:20:29,880 --> 00:20:33,933 And that's why it's so important to include the right ones. 310 00:20:42,000 --> 00:20:46,340 Thinking again about the concept of R squared, 311 00:20:46,340 --> 00:20:49,480 how well does our model fit the data? 312 00:20:49,480 --> 00:20:51,220 How much of the variation in Y 313 00:20:52,110 --> 00:20:55,040 is explained by the variation in the Xs? 314 00:20:55,040 --> 00:20:58,140 We can, again, decompose it as SST, 315 00:20:58,140 --> 00:21:01,670 the total sum of squares, the variation in Y, 316 00:21:01,670 --> 00:21:06,670 the explained sum of squares, the variation in Y-hat, 317 00:21:07,200 --> 00:21:11,040 and the sum of squared residuals, SSR. 318 00:21:11,040 --> 00:21:14,250 So note that, again, OLS 319 00:21:14,250 --> 00:21:16,993 makes SSR as small as possible, 320 00:21:19,400 --> 00:21:20,253 as before. 321 00:21:25,210 --> 00:21:27,560 So in this drawing here, 322 00:21:27,560 --> 00:21:29,243 they're writing SST as TSS, 323 00:21:31,600 --> 00:21:32,433 but it's the same thing. 324 00:21:32,433 --> 00:21:34,700 It's the total sum of squares. 325 00:21:34,700 --> 00:21:37,100 So SST 326 00:21:37,100 --> 00:21:41,190 is the sum of each Y 327 00:21:41,190 --> 00:21:44,770 minus the mean of Y squared. 328 00:21:44,770 --> 00:21:49,373 SSE is the sum of each Y-hat minus Y-bar. 329 00:21:50,601 --> 00:21:54,300 And SSR is the sum of squared residuals. 330 00:21:54,300 --> 00:21:58,250 So here's the formula. 331 00:21:58,250 --> 00:22:00,563 It's one that you've seen before. 332 00:22:01,740 --> 00:22:04,413 So again, SSD equals SSR. 333 00:22:04,413 --> 00:22:09,120 SSE plus SSR, and we do a bit of math, 334 00:22:09,120 --> 00:22:10,760 and we get R squared, 335 00:22:10,760 --> 00:22:14,710 which is defined as one minus SSR 336 00:22:14,710 --> 00:22:16,550 divided by SST. 337 00:22:16,550 --> 00:22:17,643 So it is that, 338 00:22:20,940 --> 00:22:25,430 that part of the variation in Y, 339 00:22:25,430 --> 00:22:28,290 which is explained by the Xs. 340 00:22:28,290 --> 00:22:32,323 R squared is always a number between zero and one. 341 00:22:33,210 --> 00:22:34,670 Hardly ever is it zero. 342 00:22:34,670 --> 00:22:35,750 Hardly ever is it one. 343 00:22:35,750 --> 00:22:37,650 In fact, in any regression, 344 00:22:37,650 --> 00:22:40,790 you'll basically never see this. 345 00:22:40,790 --> 00:22:44,400 What does it mean if SSR equals zero? 346 00:22:44,400 --> 00:22:48,323 That's something that you could ponder and think about. 347 00:22:54,850 --> 00:22:57,450 Here are some properties of R squared. 348 00:22:57,450 --> 00:23:01,390 So it never decreases, and it almost always increases 349 00:23:01,390 --> 00:23:02,920 when you add a regressor. 350 00:23:02,920 --> 00:23:06,290 So even if you add a total nonsense regressor, 351 00:23:06,290 --> 00:23:07,640 like your shoe size 352 00:23:07,640 --> 00:23:12,640 or how many letters are in your dog's name 353 00:23:12,940 --> 00:23:14,730 or anything like that 354 00:23:14,730 --> 00:23:17,800 that has nothing to do with your model, 355 00:23:17,800 --> 00:23:20,500 it's still going to increase R squared. 356 00:23:20,500 --> 00:23:23,390 And therefore, it's a poor criterion 357 00:23:23,390 --> 00:23:25,290 of whether to add a regressor. 358 00:23:25,290 --> 00:23:27,950 You almost always, since it always goes up, 359 00:23:27,950 --> 00:23:31,720 it really, it's not going to tell you anything. 360 00:23:31,720 --> 00:23:35,280 There is a way that you can calculate R squared 361 00:23:35,280 --> 00:23:37,300 to see which sort of compensates 362 00:23:37,300 --> 00:23:40,440 for the decrease of degrees of freedom. 363 00:23:40,440 --> 00:23:45,440 So it sort of looks at, is the model, 364 00:23:45,680 --> 00:23:49,600 is the R squared better, given that we know 365 00:23:49,600 --> 00:23:52,740 that we lost some degrees of freedom 366 00:23:52,740 --> 00:23:55,190 and sort of compensates it for that. 367 00:23:55,190 --> 00:23:56,833 And that's a better way. 368 00:24:01,110 --> 00:24:03,260 Now we're gonna look at the same kind of assumptions 369 00:24:03,260 --> 00:24:05,220 that we looked at last time. 370 00:24:05,220 --> 00:24:08,800 So these are the things that we assume are true 371 00:24:08,800 --> 00:24:13,670 or that must be true for an OLS model 372 00:24:13,670 --> 00:24:17,773 in order for it to be the best unbiased estimator. 373 00:24:18,908 --> 00:24:21,270 And these are the same as you've seen before, 374 00:24:21,270 --> 00:24:25,330 that it has to be linear in parameters, random sampling, 375 00:24:25,330 --> 00:24:28,180 non-stochastic Xs 376 00:24:28,180 --> 00:24:31,100 that are not perfectly colinear, 377 00:24:31,100 --> 00:24:35,170 and that the residual has to have zero conditional mean. 378 00:24:35,170 --> 00:24:36,763 So I'm gonna walk through each one. 379 00:24:39,410 --> 00:24:41,510 So it has to be a linear model 380 00:24:41,510 --> 00:24:44,700 that you can actually write the population model 381 00:24:44,700 --> 00:24:49,700 in these terms, as a function of Y and Xs, as you see here. 382 00:24:50,940 --> 00:24:54,730 Again, the betas are the unknown parameters, 383 00:24:54,730 --> 00:24:57,433 and the u is the disturbance term. 384 00:24:58,540 --> 00:25:00,300 And what this means is that the betas 385 00:25:00,300 --> 00:25:03,020 cannot have any exponent other than one 386 00:25:03,020 --> 00:25:05,140 for it to be a linear function. 387 00:25:05,140 --> 00:25:08,150 Note that the Xs can have exponents, 388 00:25:08,150 --> 00:25:10,010 so it could be squares or logs 389 00:25:10,010 --> 00:25:11,890 or square roots or all kinds of other things. 390 00:25:11,890 --> 00:25:15,930 And I think squared is probably the most common one 391 00:25:15,930 --> 00:25:17,373 that you're gonna encounter. 392 00:25:19,180 --> 00:25:22,330 If you think that the relationship has a curve in it, 393 00:25:22,330 --> 00:25:24,830 that it's not aligned, you can often add a square. 394 00:25:27,550 --> 00:25:29,370 Next is random sampling. 395 00:25:29,370 --> 00:25:34,370 So again, we draw a sample from a population, 396 00:25:34,850 --> 00:25:36,523 and it's a random sample. 397 00:25:37,570 --> 00:25:40,290 So there's no clear selection bias. 398 00:25:40,290 --> 00:25:44,200 So we don't only choose old folks or men 399 00:25:44,200 --> 00:25:46,790 or high income or larger household 400 00:25:46,790 --> 00:25:48,210 or married people or anything like that. 401 00:25:48,210 --> 00:25:51,723 It's representative of the population. 402 00:25:53,790 --> 00:25:57,070 Third is no perfect collinearity, 403 00:25:57,070 --> 00:26:01,740 that none of the, each regressor, 404 00:26:01,740 --> 00:26:03,010 well, one way to think of it, 405 00:26:03,010 --> 00:26:05,400 is must add some new information, 406 00:26:05,400 --> 00:26:09,110 that there's no perfect linear relationship 407 00:26:09,110 --> 00:26:11,240 among any of the regressors. 408 00:26:11,240 --> 00:26:14,630 And we need this for it to work mathematically, 409 00:26:14,630 --> 00:26:18,800 that their betas will not be defined if you can't, 410 00:26:18,800 --> 00:26:23,233 if there is perfect collinearity. 411 00:26:25,470 --> 00:26:27,490 Note that they can be correlated, 412 00:26:27,490 --> 00:26:31,140 that they almost always are and will be, 413 00:26:31,140 --> 00:26:34,170 just not perfectly so. 414 00:26:34,170 --> 00:26:36,860 So if you are a matrix algebra nerd, 415 00:26:36,860 --> 00:26:39,533 know that our X matrix, 416 00:26:42,070 --> 00:26:47,070 the matrix is a singular matrix. 417 00:26:47,080 --> 00:26:51,210 It'll have a zero determinant, and it cannot be inverted. 418 00:26:51,210 --> 00:26:53,460 So much like we saw before, 419 00:26:53,460 --> 00:26:55,930 it's sorta like dividing by zero. 420 00:26:55,930 --> 00:26:57,513 It's just undefined. 421 00:26:58,390 --> 00:27:00,180 It's certainly okay 422 00:27:00,180 --> 00:27:03,710 for them to be a non-linear relationship. 423 00:27:03,710 --> 00:27:07,810 So you could include both income and income squared, 424 00:27:07,810 --> 00:27:12,810 again, if you think the relationship has a curve in it, 425 00:27:13,360 --> 00:27:16,980 like increasing, but at a decreasing rate. 426 00:27:16,980 --> 00:27:17,953 And that's fine. 427 00:27:20,730 --> 00:27:23,180 Here are some examples. 428 00:27:23,180 --> 00:27:25,650 So one might be 429 00:27:27,670 --> 00:27:32,593 the expenditures in Canadian and US dollars, 430 00:27:33,630 --> 00:27:38,350 where US dollars equals a times Canadian dollars, 431 00:27:38,350 --> 00:27:39,563 a is the exchange rate. 432 00:27:41,080 --> 00:27:43,950 We often find it too in dummy variables. 433 00:27:43,950 --> 00:27:46,280 So if you code 434 00:27:46,280 --> 00:27:51,250 whether you live in Vermont as a yes, 435 00:27:51,250 --> 00:27:53,250 one equals yes, zero equals no, 436 00:27:53,250 --> 00:27:57,217 and have another variable, non-Vermont, coded the other way, 437 00:27:58,080 --> 00:27:59,980 the sum of these two is always one. 438 00:27:59,980 --> 00:28:01,360 So you can include both. 439 00:28:01,360 --> 00:28:04,723 You only include one of these two in your model. 440 00:28:08,000 --> 00:28:13,000 The intuition is how can we measure the effect of US dollars 441 00:28:13,810 --> 00:28:18,140 by holding Canadian dollars constant? 442 00:28:18,140 --> 00:28:21,750 Or how can you, if you're thinking about plant growth, 443 00:28:21,750 --> 00:28:26,260 how do you account for temperature 444 00:28:26,260 --> 00:28:28,480 in degrees Fahrenheit 445 00:28:28,480 --> 00:28:32,120 holding degrees Celsius constant? 446 00:28:32,120 --> 00:28:34,190 So that's the intuition. 447 00:28:34,190 --> 00:28:38,330 So, also, it adds no new information, 448 00:28:38,330 --> 00:28:40,300 that if you know degrees Fahrenheit, 449 00:28:40,300 --> 00:28:45,220 then you automatically know degrees Celsius. 450 00:28:45,220 --> 00:28:48,630 And the degrees Celsius, one would say, 451 00:28:48,630 --> 00:28:50,853 adds no new information at all. 452 00:28:54,270 --> 00:28:57,380 We must also have a positive degree of freedom, 453 00:28:57,380 --> 00:29:01,430 so N must be strictly greater than K+1. 454 00:29:01,430 --> 00:29:05,490 So the number of observations must be strictly greater 455 00:29:05,490 --> 00:29:09,223 than the number of regressors plus one. 456 00:29:10,300 --> 00:29:14,320 Otherwise, you have more unknowns than equations, 457 00:29:14,320 --> 00:29:18,370 and you have either no or infinite solution. 458 00:29:18,370 --> 00:29:21,670 And the more degrees of freedom that you have, 459 00:29:21,670 --> 00:29:24,290 the lower the variance of beta. 460 00:29:24,290 --> 00:29:28,743 And here's a YouTube that explains that. 461 00:29:34,180 --> 00:29:37,020 There are many benefits to having a big N, 462 00:29:37,020 --> 00:29:38,920 to having a large sample size, 463 00:29:38,920 --> 00:29:43,240 and one of those is it increases your degrees of freedom. 464 00:29:43,240 --> 00:29:46,590 And as you see here in this table, 465 00:29:46,590 --> 00:29:51,590 the more degrees of freedom that you have, 466 00:29:51,860 --> 00:29:56,140 the lower the test stat has to be 467 00:29:56,140 --> 00:29:57,963 to be significant. 468 00:29:58,880 --> 00:30:02,250 And next week when we talk about hypothesis tests, 469 00:30:02,250 --> 00:30:04,633 I think this is gonna make even more sense. 470 00:30:06,010 --> 00:30:11,010 But basically, the higher degree of freedom, 471 00:30:11,180 --> 00:30:14,760 the bigger N leads to a more efficient estimator. 472 00:30:14,760 --> 00:30:16,390 And it's a theme that we're gonna revisit 473 00:30:16,390 --> 00:30:19,870 over and over again, more information 474 00:30:19,870 --> 00:30:21,900 leads to lower variance, 475 00:30:21,900 --> 00:30:24,940 or more information leads to 476 00:30:29,430 --> 00:30:33,870 a more efficient estimator, or a lower variance estimator. 477 00:30:33,870 --> 00:30:37,360 And having bigger N, getting information from more people, 478 00:30:37,360 --> 00:30:39,623 is one way that you can get more information. 479 00:30:44,630 --> 00:30:47,460 The next assumption, again, 480 00:30:47,460 --> 00:30:50,290 this is one we should be familiar with by now, 481 00:30:50,290 --> 00:30:53,070 that the expected value of the error term, 482 00:30:53,070 --> 00:30:55,980 no matter the value of X, is zero. 483 00:30:55,980 --> 00:31:00,610 And that no matter the value of X, 484 00:31:00,610 --> 00:31:03,020 the expected value is the same. 485 00:31:03,020 --> 00:31:06,150 It's this idea that the error term 486 00:31:06,150 --> 00:31:10,010 is uncorrelated with the regressors, 487 00:31:10,010 --> 00:31:13,973 and this is needed to have an unbiased estimator. 488 00:31:14,920 --> 00:31:17,300 And that is why, as we'll see, 489 00:31:17,300 --> 00:31:19,630 that omitting an important variable 490 00:31:22,540 --> 00:31:25,423 will result in bias. 491 00:31:30,700 --> 00:31:33,740 When this zero conditional mean holds, 492 00:31:33,740 --> 00:31:37,110 we say that our regressors are explanatory. 493 00:31:37,110 --> 00:31:39,100 The variables are exogenous. 494 00:31:39,100 --> 00:31:41,430 That's what we want, exogenous is good. 495 00:31:41,430 --> 00:31:43,280 When they are correlated with the error term, 496 00:31:43,280 --> 00:31:44,750 they are said to be endogenous. 497 00:31:44,750 --> 00:31:48,020 And most, a lot of the topics 498 00:31:48,020 --> 00:31:50,270 that we'll be covering toward the end of class 499 00:31:50,270 --> 00:31:54,650 will deal with how to detect 500 00:31:54,650 --> 00:31:57,393 if they're endogenous and what to do about it. 501 00:32:02,190 --> 00:32:06,440 So, if we have these four assumptions holding, 502 00:32:06,440 --> 00:32:08,430 if all four are true, 503 00:32:08,430 --> 00:32:12,340 then every beta-hat is unbiased. 504 00:32:12,340 --> 00:32:17,000 And know that what this means is the procedure is unbiased. 505 00:32:17,000 --> 00:32:18,803 The model is unbiased. 506 00:32:20,628 --> 00:32:23,600 It doesn't mean that every single beta-hat 507 00:32:23,600 --> 00:32:26,810 will fall exactly on the true value. 508 00:32:26,810 --> 00:32:29,410 It means that there's no systematic reason 509 00:32:29,410 --> 00:32:32,780 why we should think it's too big or too small, 510 00:32:32,780 --> 00:32:35,000 and if we did this over and over again 511 00:32:35,000 --> 00:32:38,810 that the value would converge to its true value. 512 00:32:38,810 --> 00:32:41,123 And that's what unbiased means. 513 00:32:47,080 --> 00:32:52,080 We're gonna talk about two cases now under specification. 514 00:32:52,890 --> 00:32:56,697 What regressors should you include in your model? 515 00:32:56,697 --> 00:32:59,990 And we're gonna talk first about overspecifying, 516 00:32:59,990 --> 00:33:02,973 which is including irrelevant ones, 517 00:33:04,590 --> 00:33:06,090 like your shoe size 518 00:33:06,090 --> 00:33:09,070 or the number of letters in your dog's name 519 00:33:09,070 --> 00:33:10,230 or something like that. 520 00:33:10,230 --> 00:33:13,210 And sort of probably more seriously, 521 00:33:13,210 --> 00:33:18,120 what happens when you omit ones that you should include? 522 00:33:18,120 --> 00:33:21,113 And we'll see those omitted variable bias. 523 00:33:24,470 --> 00:33:27,430 First, we'll deal with the issue of overspecifying, 524 00:33:27,430 --> 00:33:32,100 which is including a variable that is irrelevant. 525 00:33:32,100 --> 00:33:33,320 It could be nonsense 526 00:33:33,320 --> 00:33:35,370 or just has nothing to do with the model. 527 00:33:40,180 --> 00:33:44,940 So suppose we specify this model with three regressors, 528 00:33:44,940 --> 00:33:48,500 and assumptions 1 through 4 are met, 529 00:33:48,500 --> 00:33:52,720 and everything is cool, but X3 has no effect. 530 00:33:52,720 --> 00:33:56,610 So X3 has no effect on Y, 531 00:33:56,610 --> 00:34:01,610 that in the true parameter in the population is zero. 532 00:34:02,100 --> 00:34:04,210 The slope is zero. 533 00:34:04,210 --> 00:34:08,563 Changing X3 has absolutely no effect on Y. 534 00:34:12,250 --> 00:34:13,193 What happens? 535 00:34:15,530 --> 00:34:17,520 Well, there's good news and bad news. 536 00:34:17,520 --> 00:34:22,520 The good news is since, recall a few slides ago, 537 00:34:22,980 --> 00:34:27,120 that because B3 or beta-3 equals zero, 538 00:34:27,120 --> 00:34:28,790 it won't create bias. 539 00:34:28,790 --> 00:34:33,520 It won't effect any bias of beta-1 or beta-2. 540 00:34:33,520 --> 00:34:38,180 However, it will inflate the variance of the other betas. 541 00:34:38,180 --> 00:34:42,360 So it will increase the variance of beta-1 or beta-2. 542 00:34:42,360 --> 00:34:45,520 So, and there's a few ways that you can think about this. 543 00:34:45,520 --> 00:34:49,120 One, it takes away a degree of freedom for no reason. 544 00:34:49,120 --> 00:34:54,040 And two, it takes away some of the explanatory power 545 00:34:54,040 --> 00:34:55,810 of the other Xs. 546 00:34:55,810 --> 00:34:59,220 So especially if X3 547 00:34:59,220 --> 00:35:04,220 has any overlap with X2 and X1, 548 00:35:04,310 --> 00:35:07,300 it will take away some of the information 549 00:35:07,300 --> 00:35:11,960 that is in those variables 550 00:35:11,960 --> 00:35:16,150 and therefore take away their explanatory power. 551 00:35:16,150 --> 00:35:18,200 And we're gonna talk about this a bit more 552 00:35:18,200 --> 00:35:22,010 when we think about the variance of our beta-hats, 553 00:35:22,010 --> 00:35:25,833 which is sort of the end of this topic. 554 00:35:28,230 --> 00:35:32,910 So underspecifying is, in a sense, more serious 555 00:35:32,910 --> 00:35:36,140 because it creates bias. 556 00:35:36,140 --> 00:35:39,320 But sometimes we can know what the direction 557 00:35:39,320 --> 00:35:42,760 and maybe even size of the bias is. 558 00:35:42,760 --> 00:35:46,900 So assume that the true model is this, 559 00:35:46,900 --> 00:35:49,690 that only X2 and X1 560 00:35:49,690 --> 00:35:53,030 are the relevant regressors, 561 00:35:53,030 --> 00:35:55,680 and it's all well-behaved, 562 00:35:55,680 --> 00:35:57,663 assumptions 1 through 4 hold. 563 00:36:01,750 --> 00:36:04,160 We wanna know, what is beta-1? 564 00:36:04,160 --> 00:36:06,190 What is the effect of X1 on Y? 565 00:36:06,190 --> 00:36:10,970 However, we forget, for some reason, when we exclude X2, 566 00:36:10,970 --> 00:36:13,910 we don't know enough to include it, 567 00:36:13,910 --> 00:36:16,410 there is no data available, something like that, 568 00:36:16,410 --> 00:36:19,830 and so we run, instead, this regression 569 00:36:19,830 --> 00:36:21,870 with just a single regressor, X1. 570 00:36:21,870 --> 00:36:23,860 And I'm putting the A instead of B 571 00:36:23,860 --> 00:36:28,563 to sort of set this apart, so that it's clear, hopefully. 572 00:36:34,780 --> 00:36:37,950 So here's an example from the Wolters book. 573 00:36:37,950 --> 00:36:40,360 Again, we're looking at wages, 574 00:36:40,360 --> 00:36:43,743 and we're really interested in the returns to education. 575 00:36:45,050 --> 00:36:49,740 And we have these two regressors in the true model, 576 00:36:49,740 --> 00:36:51,410 education and ability. 577 00:36:51,410 --> 00:36:55,470 So you come with some innate ability, 578 00:36:55,470 --> 00:36:57,870 and you get education, 579 00:36:57,870 --> 00:37:00,350 and that's what drives your wage. 580 00:37:00,350 --> 00:37:02,020 Again, there's probably more, 581 00:37:02,020 --> 00:37:06,200 but just to make a simpler model. 582 00:37:06,200 --> 00:37:10,480 But we don't have a variable measuring ability. 583 00:37:10,480 --> 00:37:14,737 So we run just a single regressor model, 584 00:37:16,130 --> 00:37:18,893 education, and we get A1. 585 00:37:20,590 --> 00:37:22,130 And that's what we see. 586 00:37:22,130 --> 00:37:26,600 And here, the error term, which is v, 587 00:37:26,600 --> 00:37:28,450 which we're calling v here, 588 00:37:28,450 --> 00:37:32,570 is beta-2 times ability plus the error, 589 00:37:32,570 --> 00:37:36,240 since we forgot about adding ability. 590 00:37:36,240 --> 00:37:41,240 And almost certainly, ability and education are correlated, 591 00:37:41,610 --> 00:37:46,550 that I would assume that those with more ability 592 00:37:46,550 --> 00:37:49,060 probably seek more education. 593 00:37:49,060 --> 00:37:51,020 That might be one hypothesis. 594 00:37:51,020 --> 00:37:55,083 But regardless of the direction, 595 00:37:56,840 --> 00:38:01,270 I think common sense says that your innate ability 596 00:38:01,270 --> 00:38:03,680 and how much education that you get 597 00:38:03,680 --> 00:38:06,013 would be somewhat correlated. 598 00:38:09,190 --> 00:38:11,950 So we can think about what's the magnitude 599 00:38:11,950 --> 00:38:14,393 and the direction of A1? 600 00:38:15,410 --> 00:38:19,070 So A1 is then beta-1 601 00:38:19,070 --> 00:38:21,970 plus beta-2 times d, 602 00:38:21,970 --> 00:38:25,990 where d is the slope of regressing X2 on X1. 603 00:38:25,990 --> 00:38:29,980 So it's how correlated are X1 and X2? 604 00:38:29,980 --> 00:38:31,230 What is the effect of it? 605 00:38:34,840 --> 00:38:39,780 And beta-hat-2 is the slope from the real model, 606 00:38:39,780 --> 00:38:41,953 had we been able to run this. 607 00:38:47,130 --> 00:38:51,660 So here, the expected value of A1 608 00:38:51,660 --> 00:38:55,350 is the expected value of beta-1-hat 609 00:38:55,350 --> 00:38:58,400 plus beta-2-hat times d. 610 00:38:58,400 --> 00:39:01,890 So the bias is this second term, 611 00:39:01,890 --> 00:39:04,370 beta-2-hat times d. 612 00:39:04,370 --> 00:39:06,200 Now, if 613 00:39:07,650 --> 00:39:10,140 beta-2-hat equals zero, 614 00:39:10,140 --> 00:39:13,860 so if ability has no effect on wages, 615 00:39:13,860 --> 00:39:15,960 or if d equals zero, 616 00:39:15,960 --> 00:39:20,500 that is, ability and education are uncorrelated, 617 00:39:20,500 --> 00:39:22,560 then A1 is unbiased. 618 00:39:22,560 --> 00:39:23,800 Then we're fine. 619 00:39:23,800 --> 00:39:27,393 And we sorta talked about that a few slides ago. 620 00:39:35,340 --> 00:39:37,350 So again, if d equals zero, 621 00:39:37,350 --> 00:39:38,993 then X1 and X2 are uncorrelated. 622 00:39:41,508 --> 00:39:43,091 And that would mean 623 00:39:45,430 --> 00:39:48,220 that the expected value of X2 given X1 624 00:39:48,220 --> 00:39:49,570 is just X2. 625 00:39:49,570 --> 00:39:51,283 And then the error term, 626 00:39:52,910 --> 00:39:56,170 then X2 does not, 627 00:39:56,170 --> 00:40:00,047 is not correlated with X1, 628 00:40:00,047 --> 00:40:02,540 and X2 being in the error term 629 00:40:02,540 --> 00:40:07,540 does not violate our assumption 4, and everything is cool. 630 00:40:07,540 --> 00:40:11,390 But again, that's going to be rather rare. 631 00:40:11,390 --> 00:40:15,440 And in our example of omitting ability, 632 00:40:15,440 --> 00:40:17,900 I think, hope you can see, 633 00:40:17,900 --> 00:40:20,503 that that would not be very good reasoning. 634 00:40:23,410 --> 00:40:27,740 So we can also think about what is the direction of it 635 00:40:28,900 --> 00:40:31,940 and maybe even the magnitude. 636 00:40:31,940 --> 00:40:33,680 We can use our intuition. 637 00:40:33,680 --> 00:40:36,870 So if d is greater than zero, 638 00:40:36,870 --> 00:40:40,940 X2 and X1 are positively correlated, 639 00:40:40,940 --> 00:40:43,670 folks that have more ability seek education, 640 00:40:43,670 --> 00:40:48,670 that would be my guess, but maybe that isn't true, 641 00:40:48,880 --> 00:40:51,250 if it's less than zero, 642 00:40:51,250 --> 00:40:54,163 people with more ability get less education. 643 00:40:55,490 --> 00:40:58,670 So you need to look at the effect of X2 on Y 644 00:40:58,670 --> 00:41:02,460 as well as the effect of X2 on X1, 645 00:41:02,460 --> 00:41:04,293 and we can sort of intuit this. 646 00:41:08,120 --> 00:41:11,080 So again, this is our real model, 647 00:41:11,080 --> 00:41:15,540 where education and ability are both included. 648 00:41:15,540 --> 00:41:18,820 So if beta-2 is greater than zero, 649 00:41:18,820 --> 00:41:22,660 which means more ability leads to higher wage, 650 00:41:22,660 --> 00:41:23,980 that would be my guess, 651 00:41:23,980 --> 00:41:26,940 and d is also greater than zero, 652 00:41:26,940 --> 00:41:31,940 more ability leads to more education, 653 00:41:31,990 --> 00:41:35,380 and that means that the bias is positive, 654 00:41:35,380 --> 00:41:40,380 that the effect of education would be smaller. 655 00:41:44,727 --> 00:41:47,230 A1 is greater than A2, 656 00:41:47,230 --> 00:41:51,193 so we overstate the effect of education. 657 00:41:56,620 --> 00:42:00,770 When we have a k variable equation, 658 00:42:00,770 --> 00:42:05,320 it's gonna depend again on how correlated 659 00:42:05,320 --> 00:42:10,220 the omitted variable is with the ones that we include, 660 00:42:10,220 --> 00:42:13,070 but know that every B may be biased, 661 00:42:13,070 --> 00:42:17,057 not just those correlated with the omitted regressor. 662 00:42:20,410 --> 00:42:23,420 Here is how I approach it in general. 663 00:42:23,420 --> 00:42:26,710 I tend to use a big tent approach 664 00:42:26,710 --> 00:42:30,420 and include a lot of Xs, at least in the early model. 665 00:42:30,420 --> 00:42:35,420 So you always wanna base it on these three things, 666 00:42:35,610 --> 00:42:38,260 previous research, theory, 667 00:42:38,260 --> 00:42:40,870 and just sort of common sense and introspection. 668 00:42:40,870 --> 00:42:41,770 So you think about 669 00:42:43,418 --> 00:42:46,940 what does previous research suggest we include? 670 00:42:46,940 --> 00:42:48,940 What does theory suggest we improve? 671 00:42:48,940 --> 00:42:52,823 And what does common sense suggest that we improve? 672 00:42:54,000 --> 00:42:58,240 And we will learn down the road with hypothesis test 673 00:42:58,240 --> 00:43:02,480 how to pair down and how to sort of come to the right model, 674 00:43:02,480 --> 00:43:05,900 to start with a lot of Xs, 675 00:43:05,900 --> 00:43:08,350 and there are tests to see, 676 00:43:08,350 --> 00:43:11,923 well, if we take these out, which is the best model? 677 00:43:15,900 --> 00:43:18,810 We're going to talk about the variance 678 00:43:18,810 --> 00:43:20,260 of the OLS estimators. 679 00:43:20,260 --> 00:43:23,013 What are the variance of beta-hat? 680 00:43:26,070 --> 00:43:28,580 We've spent a bunch of time 681 00:43:28,580 --> 00:43:32,290 looking at the assumptions under which they are unbiased. 682 00:43:32,290 --> 00:43:35,430 And now we wanna think about how efficient are they 683 00:43:35,430 --> 00:43:38,980 'cause we want the best unbiased estimator. 684 00:43:38,980 --> 00:43:43,100 So beta-hat 685 00:43:43,940 --> 00:43:48,270 has variance because it has the error term in them 686 00:43:48,270 --> 00:43:50,523 because it has the observed Y in them. 687 00:43:51,670 --> 00:43:54,340 So each time you, 688 00:43:54,340 --> 00:43:57,560 so even if you've specified the model right, 689 00:43:57,560 --> 00:44:01,450 each time you draw a sample and run it, 690 00:44:01,450 --> 00:44:05,810 you're going to get a slightly different beta-hat 691 00:44:07,525 --> 00:44:11,083 because there's a new set of error terms that are drawn. 692 00:44:13,510 --> 00:44:18,510 We're going to assume now that there is homoscedasticity, 693 00:44:19,100 --> 00:44:24,100 which has that the variance of the error term is a constant. 694 00:44:24,770 --> 00:44:26,680 We're gonna deal with that later 695 00:44:26,680 --> 00:44:28,620 of what do we do when that's not true, 696 00:44:28,620 --> 00:44:31,870 how to test for it and how to account for it. 697 00:44:31,870 --> 00:44:34,100 But now we're going to assume 698 00:44:34,100 --> 00:44:37,820 that the variance of u, given any value of X, 699 00:44:37,820 --> 00:44:41,053 equals a constant, which we call sigma squared. 700 00:44:42,150 --> 00:44:46,030 That is, it's the same, regardless of any value of X, 701 00:44:46,030 --> 00:44:50,860 so no value of X will change the variance 702 00:44:52,510 --> 00:44:57,510 of u, so sort of the shape of the bell curve. 703 00:44:57,940 --> 00:45:00,640 We know it's centered over zero, 704 00:45:00,640 --> 00:45:03,640 but it could be a very tall, skinny bell curve. 705 00:45:03,640 --> 00:45:06,250 It could be a very short, fat bell curve. 706 00:45:06,250 --> 00:45:08,350 The key here, and that's basically 707 00:45:08,350 --> 00:45:11,560 what we're trying to measure, but the assumption here 708 00:45:11,560 --> 00:45:13,970 is that no matter what the value of X is, 709 00:45:13,970 --> 00:45:16,890 that the shape of that bell curve is the same. 710 00:45:21,220 --> 00:45:24,410 Here's the formula for the variance of beta-hat-j. 711 00:45:24,410 --> 00:45:29,410 So you take one particular regressor, Xj. 712 00:45:29,430 --> 00:45:32,200 What is the variance of its beta-hat? 713 00:45:32,200 --> 00:45:33,810 And here it is. 714 00:45:33,810 --> 00:45:37,483 So it has three parts, sigma squared, 715 00:45:38,320 --> 00:45:41,823 SSTj, and one minus R squared j. 716 00:45:42,770 --> 00:45:46,000 So we're gonna think about sigma squared, 717 00:45:46,000 --> 00:45:49,380 that that is, that's the variance of u, 718 00:45:49,380 --> 00:45:51,500 which we already saw, and we're gonna see in a bit 719 00:45:51,500 --> 00:45:52,683 how to measure that. 720 00:45:55,150 --> 00:45:56,980 There's also SSTj, 721 00:45:56,980 --> 00:46:01,910 which is the total sample variation of X. 722 00:46:01,910 --> 00:46:06,870 So it is the sum of every X, 723 00:46:06,870 --> 00:46:10,910 so what everybody said on the survey 724 00:46:10,910 --> 00:46:13,660 times the mean of that X, 725 00:46:13,660 --> 00:46:16,653 so the mean value from our sample, 726 00:46:17,730 --> 00:46:22,730 subtract every individual's Xj from the mean of Xj, 727 00:46:23,350 --> 00:46:26,340 square it, and sum it up from one to N. 728 00:46:26,340 --> 00:46:29,390 So for each of our N respondents, 729 00:46:29,390 --> 00:46:34,390 it's the total sum of squared, so squared X. 730 00:46:35,010 --> 00:46:38,620 And R squared j is the R squared 731 00:46:38,620 --> 00:46:41,010 of if you were to take Xj 732 00:46:41,010 --> 00:46:44,210 and regress it on all the other Xs. 733 00:46:44,210 --> 00:46:45,803 So if we're looking at X1, 734 00:46:48,510 --> 00:46:49,800 it would be the R squared. 735 00:46:49,800 --> 00:46:53,370 If we took X1 and put it on the left-hand side 736 00:46:53,370 --> 00:46:56,603 and regressed X2, X3, X4,...,Xk, 737 00:46:58,750 --> 00:47:02,590 and look at that R squared, that is what this is. 738 00:47:02,590 --> 00:47:06,160 So it's basically a measure of how correlated 739 00:47:06,160 --> 00:47:09,360 is this Xj with all of the others? 740 00:47:09,360 --> 00:47:12,990 A high R squared implies that they're highly correlated. 741 00:47:12,990 --> 00:47:15,883 A low R squared j means that they are not. 742 00:47:19,890 --> 00:47:21,070 Why does this matter? 743 00:47:21,070 --> 00:47:23,870 Why do we care about variance? 744 00:47:23,870 --> 00:47:26,600 Well, if we have a high variance, 745 00:47:26,600 --> 00:47:27,630 so if you can think about it 746 00:47:27,630 --> 00:47:32,630 as kind of a short, fat bell curve, 747 00:47:34,080 --> 00:47:37,030 we have a less precise estimator. 748 00:47:37,030 --> 00:47:41,080 There's a need for larger confidence intervals. 749 00:47:41,080 --> 00:47:44,440 And thus, we're less likely to find significance. 750 00:47:44,440 --> 00:47:47,146 So if you run a regression, 751 00:47:47,146 --> 00:47:49,770 and you don't find anything significant, 752 00:47:49,770 --> 00:47:51,620 it's kind of deflating. 753 00:47:51,620 --> 00:47:53,300 It's like, er, this is, 754 00:47:53,300 --> 00:47:57,580 it's not very interesting. 755 00:47:57,580 --> 00:47:59,953 So even from a really practical matter, 756 00:48:03,010 --> 00:48:06,630 you wanna find things, if they are significant, 757 00:48:06,630 --> 00:48:11,030 you want to find that 'cause that's sort of 758 00:48:12,160 --> 00:48:14,813 what's interesting to talk about. 759 00:48:18,250 --> 00:48:22,563 So I will go over each of the three components of variance. 760 00:48:27,330 --> 00:48:29,853 First is sigma squared. 761 00:48:30,830 --> 00:48:33,880 So a higher, note that in the formula, 762 00:48:33,880 --> 00:48:36,710 higher sigma squared means higher variance. 763 00:48:36,710 --> 00:48:39,820 It means that the error terms are all over the place. 764 00:48:39,820 --> 00:48:43,693 It means that we have a short, fat, 765 00:48:45,710 --> 00:48:47,893 very spread out bell curve. 766 00:48:48,830 --> 00:48:50,040 Another way of thinking about it 767 00:48:50,040 --> 00:48:52,230 is more noise in the equation 768 00:48:52,230 --> 00:48:55,690 makes it harder to predict partial effects. 769 00:48:55,690 --> 00:49:00,240 And know that it is a population measure. 770 00:49:00,240 --> 00:49:03,090 It's independent of the sample size. 771 00:49:03,090 --> 00:49:07,810 And it is unknown, but there is a way to estimate it, 772 00:49:07,810 --> 00:49:09,210 which we'll see in a minute. 773 00:49:15,415 --> 00:49:16,960 SSTj, again, 774 00:49:16,960 --> 00:49:20,773 is the total variation in X. 775 00:49:22,160 --> 00:49:26,410 And the more variation in X, 776 00:49:26,410 --> 00:49:28,260 the smaller the variance. 777 00:49:28,260 --> 00:49:31,300 It's in the denominator, so you want a big SSTj. 778 00:49:31,300 --> 00:49:35,280 So that means that you want Xs 779 00:49:35,280 --> 00:49:37,533 to have some variation. 780 00:49:38,410 --> 00:49:42,903 So if Xj here is age, 781 00:49:44,420 --> 00:49:46,163 you want your sample, 782 00:49:47,259 --> 00:49:49,680 you always want your sample to be random, 783 00:49:49,680 --> 00:49:52,623 but if you're drawing from, 784 00:49:54,660 --> 00:49:59,220 folks from across the age spectrum 785 00:49:59,220 --> 00:50:02,340 from, say, 18 to 99, 786 00:50:02,340 --> 00:50:05,840 the Xs are going to be more spread out. 787 00:50:05,840 --> 00:50:08,440 And the partial effect of age 788 00:50:08,440 --> 00:50:11,930 is going to be a lot easier to predict. 789 00:50:11,930 --> 00:50:14,030 So you can think about it that 790 00:50:14,030 --> 00:50:18,340 if you're trying to eyeball the slope of a line, 791 00:50:18,340 --> 00:50:22,430 and all of the dots are all sort of converged 792 00:50:22,430 --> 00:50:26,070 around a single X, so you only have folks 793 00:50:26,070 --> 00:50:30,230 that are 31, 30, 31, 32, 31, 30, 794 00:50:30,230 --> 00:50:31,670 it's gonna be hard to sort of, 795 00:50:31,670 --> 00:50:33,583 what is the slope of this line? 796 00:50:36,220 --> 00:50:37,800 This is another example 797 00:50:37,800 --> 00:50:41,250 where more info leads to lower variance. 798 00:50:41,250 --> 00:50:45,060 And increasing sample size unambiguously 799 00:50:46,620 --> 00:50:51,280 increases the variation in X 800 00:50:51,280 --> 00:50:55,850 because we're not dividing by N here. 801 00:50:55,850 --> 00:50:57,660 It's just summing them up. 802 00:50:57,660 --> 00:50:59,573 So the more Xs that you have, 803 00:51:00,900 --> 00:51:03,453 unless every single one is on the mean, 804 00:51:04,800 --> 00:51:08,750 adding N will make SST get bigger 805 00:51:08,750 --> 00:51:10,823 and will decrease your variance. 806 00:51:16,400 --> 00:51:21,400 R squared j is the R squared that if you would take Xj 807 00:51:21,780 --> 00:51:24,610 and put it on the left side and run a regression 808 00:51:24,610 --> 00:51:26,250 with all the other Xs on the right side, 809 00:51:26,250 --> 00:51:27,740 what's the R squared? 810 00:51:27,740 --> 00:51:31,430 So if R squared j is one, 811 00:51:31,430 --> 00:51:34,343 we have a perfect linear combination. 812 00:51:36,844 --> 00:51:40,060 And you'll see that we're dividing by zero. 813 00:51:40,060 --> 00:51:42,193 And again, it makes this blow up too. 814 00:51:43,260 --> 00:51:47,420 And in this case, Xj adds no new information. 815 00:51:47,420 --> 00:51:51,040 So you want Xj 816 00:51:51,040 --> 00:51:54,850 to say something that the other Xs don't say. 817 00:51:54,850 --> 00:51:59,290 And the more it says something that the other Xs don't say, 818 00:51:59,290 --> 00:52:02,130 the more new information that you're getting here, 819 00:52:02,130 --> 00:52:04,823 and the lower the variance. 820 00:52:05,670 --> 00:52:08,010 There's a way, and I will show you, 821 00:52:08,010 --> 00:52:13,010 to calculate the variance inflation factor, 822 00:52:13,023 --> 00:52:14,203 or the VIF. 823 00:52:14,203 --> 00:52:18,363 And in SPSS, you'll find it under collinearity diagnostics. 824 00:52:19,630 --> 00:52:24,233 And the VIF is one divided by one minus R squared j. 825 00:52:25,430 --> 00:52:27,060 If it's less than four, 826 00:52:27,060 --> 00:52:30,700 this is just sort of a rule of thumb I've learned, 827 00:52:30,700 --> 00:52:34,200 if your VIF is less than four, it really isn't a problem. 828 00:52:34,200 --> 00:52:36,700 If it's less than 10, it isn't a major problem. 829 00:52:36,700 --> 00:52:40,223 It's more than 10, you may have a problem. 830 00:52:46,130 --> 00:52:47,710 Here, I will show you 831 00:52:47,710 --> 00:52:51,980 that if you exclude a variable, 832 00:52:51,980 --> 00:52:56,630 it introduces bias, but it also decreases the variance. 833 00:52:56,630 --> 00:52:59,190 So there's kind of a trade-off here. 834 00:52:59,190 --> 00:53:02,983 So think back of our k=2 example, 835 00:53:04,140 --> 00:53:06,853 where the real model, say, includes X1. 836 00:53:08,010 --> 00:53:10,293 I mean, it includes X2, 837 00:53:11,140 --> 00:53:14,890 but we run another model where we exclude it. 838 00:53:14,890 --> 00:53:17,920 So beta-1-hat is where we include it, 839 00:53:17,920 --> 00:53:22,533 and it's the formula that we know well, 840 00:53:25,039 --> 00:53:26,580 the bottom box here, 841 00:53:26,580 --> 00:53:31,580 the variance of A1 now no longer has this term. 842 00:53:35,848 --> 00:53:37,233 Since R squared 1 843 00:53:38,130 --> 00:53:41,680 is always a number between zero and one, 844 00:53:41,680 --> 00:53:46,460 one minus that number is also a number between zero and one. 845 00:53:46,460 --> 00:53:49,310 And by including it 846 00:53:49,310 --> 00:53:52,133 in the third box down, 847 00:53:53,860 --> 00:53:55,210 it's going to have, 848 00:53:55,210 --> 00:53:58,840 you're dividing by a number less than one, 849 00:53:58,840 --> 00:54:02,350 which is like multiplying by a number greater than one. 850 00:54:02,350 --> 00:54:06,380 So the variance of A1 is going to be less 851 00:54:06,380 --> 00:54:08,810 than the variance of beta-1-hat. 852 00:54:08,810 --> 00:54:11,853 So that's the trade-off, and that's why it happens. 853 00:54:17,080 --> 00:54:20,033 So here's more about that. 854 00:54:21,520 --> 00:54:26,117 And it depends on how much new information does X2 add? 855 00:54:27,420 --> 00:54:30,290 The more new information, 856 00:54:30,290 --> 00:54:34,380 the smaller this R squared is, 857 00:54:34,380 --> 00:54:38,300 and the less effect that it has on the variance. 858 00:54:38,300 --> 00:54:40,540 So you can play with the math 859 00:54:40,540 --> 00:54:44,423 and see if you can see what is going on here. 860 00:54:48,760 --> 00:54:52,430 So the variance of A1 is always smaller 861 00:54:52,430 --> 00:54:55,150 than the variance beta-1-hat, 862 00:54:55,150 --> 00:54:58,280 unless X2 is uncorrelated with X1, 863 00:54:58,280 --> 00:55:00,123 and then it would be the same. 864 00:55:01,200 --> 00:55:05,780 And if X1 and X2 are uncorrelated, 865 00:55:05,780 --> 00:55:08,513 I mean, are correlated, 866 00:55:13,070 --> 00:55:16,740 the bias trade-off depends on whether B2 867 00:55:18,270 --> 00:55:20,373 is zero or not. 868 00:55:21,220 --> 00:55:25,510 But the variance of beta 869 00:55:25,510 --> 00:55:27,263 is always going to be greater. 870 00:55:33,500 --> 00:55:38,090 The bottom line is that adding an irrelevant variable 871 00:55:38,090 --> 00:55:43,003 exacerbates multicollinearity and increases variance. 872 00:55:44,520 --> 00:55:47,840 And adding observations can decrease variance, 873 00:55:47,840 --> 00:55:49,903 but it doesn't address bias. 874 00:55:52,800 --> 00:55:55,120 To calculate the variance, 875 00:55:55,120 --> 00:55:58,150 we need to estimate sigma squared. 876 00:55:58,150 --> 00:56:02,140 So we want an unbiased estimator of that, 877 00:56:02,140 --> 00:56:04,140 which is sigma squared hat. 878 00:56:04,140 --> 00:56:08,417 And we do this by using the residuals, 879 00:56:09,400 --> 00:56:12,840 which is, for every person, 880 00:56:12,840 --> 00:56:15,300 ui-hat equals Yi minus Y-i-hat, 881 00:56:15,300 --> 00:56:19,810 so what they actually said versus the predicted value 882 00:56:19,810 --> 00:56:22,613 based on what their Xs are. 883 00:56:31,807 --> 00:56:34,170 Sigma squared hat is sum 884 00:56:34,170 --> 00:56:38,630 of the ui-hat squared 885 00:56:38,630 --> 00:56:41,630 normalized by degrees of freedom. 886 00:56:41,630 --> 00:56:46,520 So it's the sum of squared residuals 887 00:56:46,520 --> 00:56:49,600 divided by n minus k minus one. 888 00:56:49,600 --> 00:56:53,180 And note that as N goes to infinity, 889 00:56:53,180 --> 00:56:55,470 this is going to get smaller and smaller and smaller. 890 00:56:55,470 --> 00:56:57,280 So again, a bigger N 891 00:56:57,280 --> 00:57:00,760 will make for a smaller sigma squared hat, 892 00:57:00,760 --> 00:57:04,270 and which makes the overall variance 893 00:57:04,270 --> 00:57:08,163 of beta-1-hat smaller and smaller. 894 00:57:15,040 --> 00:57:17,880 The standard error of the regression 895 00:57:17,880 --> 00:57:20,840 is the positive square root of this estimate, 896 00:57:20,840 --> 00:57:25,840 of sigma-hat squared, square root of. 897 00:57:25,890 --> 00:57:29,510 And this estimate is unbiased, 898 00:57:29,510 --> 00:57:33,970 only if the assumption 5, 899 00:57:33,970 --> 00:57:35,943 homoscedasticity, holds. 900 00:57:37,220 --> 00:57:40,600 So now you could, in theory, 901 00:57:40,600 --> 00:57:45,090 calculate each of the components 902 00:57:45,090 --> 00:57:50,090 and calculate the variance for a beta-1-hat. 903 00:57:52,970 --> 00:57:57,940 So here is our final problem set for this, 904 00:57:57,940 --> 00:58:01,410 and want you to think about collinearity, 905 00:58:01,410 --> 00:58:04,210 about omitted variable bias, 906 00:58:04,210 --> 00:58:07,990 and the variance of the beta-hat, 907 00:58:07,990 --> 00:58:12,070 and why, as each one increases or decreases, 908 00:58:12,070 --> 00:58:14,583 what happens to the variance and why? 909 00:58:17,530 --> 00:58:18,900 This is what we did. 910 00:58:18,900 --> 00:58:23,630 Thanks for watching this, 911 00:58:23,630 --> 00:58:25,533 and have a good day.