1 00:00:02,040 --> 00:00:02,873 - [Instructor] Hello 2 00:00:02,873 --> 00:00:07,873 and welcome to the lecture on classical linear regression. 3 00:00:08,190 --> 00:00:12,930 So we're going to look at the case where we have 1x 4 00:00:14,740 --> 00:00:17,083 in the linear model. 5 00:00:18,610 --> 00:00:22,603 It's the simplest case, and we'll build on it as we go. 6 00:00:24,760 --> 00:00:26,490 So this is what we're gonna do: 7 00:00:26,490 --> 00:00:31,460 we're gonna look at the single regressor example. 8 00:00:31,460 --> 00:00:34,420 Again, there's really not going to be a lot 9 00:00:34,420 --> 00:00:37,690 of real life examples where you can only have 1x, 10 00:00:37,690 --> 00:00:39,770 but by understanding the properties 11 00:00:39,770 --> 00:00:41,373 of this most simple model, 12 00:00:43,060 --> 00:00:46,620 provides the building blocks that we can use to build 13 00:00:46,620 --> 00:00:50,480 and understand as we add more regressors. 14 00:00:50,480 --> 00:00:51,450 So we'll start with that. 15 00:00:51,450 --> 00:00:53,150 We're gonna look at the betas, 16 00:00:53,150 --> 00:00:55,820 at the disturbances for the error term, 17 00:00:55,820 --> 00:00:57,640 look at the OLS estimator. 18 00:00:57,640 --> 00:01:00,570 So how does ordinarily squares 19 00:01:04,660 --> 00:01:08,280 derive the value of the beta based on the data 20 00:01:08,280 --> 00:01:09,420 that we have? 21 00:01:09,420 --> 00:01:12,420 And the five assumptions, 22 00:01:12,420 --> 00:01:15,500 which lead us to the 23 00:01:18,134 --> 00:01:20,700 OLS estimator being 24 00:01:23,930 --> 00:01:26,870 the best unbiased estimator that we can get 25 00:01:26,870 --> 00:01:28,683 when these five things hold. 26 00:01:31,550 --> 00:01:32,690 So here is 27 00:01:34,380 --> 00:01:35,690 problem set 2. 28 00:01:35,690 --> 00:01:39,570 So what's the relationship between beta naught 29 00:01:39,570 --> 00:01:42,433 and the expected value of the disturbance term? 30 00:01:44,960 --> 00:01:47,170 I want you to think about 31 00:01:49,520 --> 00:01:53,800 the single regressor example, 32 00:01:53,800 --> 00:01:57,350 and what does beta 1 hat measure? 33 00:01:57,350 --> 00:01:58,750 What does it mean? 34 00:01:58,750 --> 00:02:03,360 And give an example as needed; 35 00:02:03,360 --> 00:02:08,360 3: Describe the intuition of how beta 1 hat is derived. 36 00:02:10,620 --> 00:02:14,590 3: What is the major implication of the linearity 37 00:02:14,590 --> 00:02:15,740 of this model? 38 00:02:15,740 --> 00:02:18,570 And last, what are the five assumptions? 39 00:02:18,570 --> 00:02:21,560 What is the major implication if they will hold? 40 00:02:21,560 --> 00:02:26,393 And what is an example of if they do not hold, 41 00:02:27,370 --> 00:02:29,913 well, what's an example of a violation? 42 00:02:31,640 --> 00:02:36,350 And then I would also like you to do two computer problems 43 00:02:36,350 --> 00:02:40,220 in SPSS, C2.1 and C2.2 44 00:02:41,920 --> 00:02:43,630 in the Wooldridge text. 45 00:02:43,630 --> 00:02:48,163 I will provide the data and the questions. 46 00:02:51,290 --> 00:02:56,250 So our simplest regression is when k equals 1, 47 00:02:56,250 --> 00:02:58,330 when we have one regressor, 48 00:02:58,330 --> 00:03:00,320 and you can see, here's our model, 49 00:03:00,320 --> 00:03:04,020 y equals beta nought plus beta 1 x 1 plus u. 50 00:03:04,020 --> 00:03:08,103 So y is known as the dependent variable, 51 00:03:09,640 --> 00:03:13,030 most commonly, that's what we're gonna call it. 52 00:03:13,030 --> 00:03:15,760 You may also see the explain, the response, 53 00:03:15,760 --> 00:03:19,840 the predicted, et cetera, where X is the independent, 54 00:03:19,840 --> 00:03:23,210 or sometimes called the regressor, 55 00:03:23,210 --> 00:03:26,803 and then u is the error term or the disturbance. 56 00:03:31,300 --> 00:03:33,980 So this model has two betas. 57 00:03:33,980 --> 00:03:37,510 So beta naught is the intercept or the constant term. 58 00:03:37,510 --> 00:03:41,940 So you can think about it as if x equals zero, 59 00:03:41,940 --> 00:03:44,060 what's the value of Y? 60 00:03:44,060 --> 00:03:46,340 Sometimes this would have 61 00:03:48,200 --> 00:03:52,830 sort of a sensible interpretation, sometimes not. 62 00:03:52,830 --> 00:03:54,710 It really depends on 63 00:03:56,820 --> 00:03:59,580 the nature of the model. 64 00:03:59,580 --> 00:04:03,830 And beta 1 is the slope parameter. 65 00:04:03,830 --> 00:04:08,830 Usually, this is the figure that we're most interested in 66 00:04:10,610 --> 00:04:11,740 in econometrics. 67 00:04:11,740 --> 00:04:15,360 This is the primary relationship. 68 00:04:15,360 --> 00:04:18,020 And that is because we know 69 00:04:20,380 --> 00:04:23,340 that y will change 70 00:04:25,200 --> 00:04:29,100 by beta units 71 00:04:29,100 --> 00:04:31,273 if we change x by 1. 72 00:04:32,530 --> 00:04:36,010 And as long as our error term does not change, 73 00:04:36,010 --> 00:04:41,010 that we can express it as delta y equals beta 1 delta x. 74 00:04:42,960 --> 00:04:47,233 So if x would change by five units, 75 00:04:48,680 --> 00:04:53,680 the predicted value of y will change by five times beta. 76 00:04:54,820 --> 00:04:59,670 And an important thing about this is that 77 00:04:59,670 --> 00:05:04,670 this linear assumption assumes that no matter the value 78 00:05:06,140 --> 00:05:10,020 of x, so if you have a big x or a small x, 79 00:05:10,020 --> 00:05:13,690 you change x by a little or a lot, 80 00:05:13,690 --> 00:05:15,990 it doesn't matter where you start. 81 00:05:15,990 --> 00:05:19,750 It doesn't matter if x is big or small. 82 00:05:19,750 --> 00:05:23,773 The slope of the line doesn't change that. 83 00:05:25,680 --> 00:05:30,510 One unit change in x will have the same effect on y, 84 00:05:30,510 --> 00:05:32,383 no matter where you start. 85 00:05:39,010 --> 00:05:39,843 So 86 00:05:43,410 --> 00:05:48,410 one of the fundamental assumptions 87 00:05:49,880 --> 00:05:54,880 is that the expected value of our error term is zero. 88 00:05:56,210 --> 00:06:00,380 So we assume that it will have a mean of zero, 89 00:06:00,380 --> 00:06:04,200 that if we draw over and over and over again, 90 00:06:04,200 --> 00:06:09,200 that the value of the mean is going to converge to zero. 91 00:06:11,270 --> 00:06:16,270 And mathematically, this is pretty easy to make happen, 92 00:06:18,480 --> 00:06:23,480 that we can sort of slide the intercept up and down 93 00:06:24,850 --> 00:06:27,600 to make sure that this holds true. 94 00:06:27,600 --> 00:06:32,600 So in a sense, we sort of normalize everything else, 95 00:06:32,840 --> 00:06:37,840 especially the beta naught or the box on the right here, 96 00:06:37,840 --> 00:06:40,943 the alpha term, so that it's true. 97 00:06:44,050 --> 00:06:47,593 It's a very fundamental part of econometrics, 98 00:06:48,878 --> 00:06:52,110 that the expected value of the error term 99 00:06:52,110 --> 00:06:53,613 is always zero. 100 00:07:00,740 --> 00:07:04,520 Not only is the expected value 101 00:07:04,520 --> 00:07:07,573 of the error term always zero. 102 00:07:08,860 --> 00:07:13,620 The error term is always uncorrelated with the x, 103 00:07:13,620 --> 00:07:17,370 and this will be true in this model when we have one x 104 00:07:17,370 --> 00:07:20,143 or as many x's as we have, 105 00:07:20,990 --> 00:07:25,650 the error term is uncorrelated with every x. 106 00:07:27,720 --> 00:07:30,100 Another way of saying that is 107 00:07:30,100 --> 00:07:34,110 the error term is mean independent of x. 108 00:07:34,110 --> 00:07:38,653 That no matter what value of x that you may have, 109 00:07:39,570 --> 00:07:44,570 so no matter what x someone might answer on a survey, 110 00:07:44,640 --> 00:07:49,230 the expected value of the error term is always the same. 111 00:07:49,230 --> 00:07:53,250 That knowing the value of x does not tell you 112 00:07:53,250 --> 00:07:56,930 any information about the value of the error term, 113 00:07:56,930 --> 00:08:00,190 that they are independent mathematically, 114 00:08:00,190 --> 00:08:03,580 or that you could also say that they are orthogonal, 115 00:08:03,580 --> 00:08:07,460 that they sort of don't have anything to do with each other. 116 00:08:07,460 --> 00:08:09,443 They have no correlation. 117 00:08:10,410 --> 00:08:13,980 So knowing something about the x does not tell you anything 118 00:08:13,980 --> 00:08:15,533 about the error term. 119 00:08:20,710 --> 00:08:25,710 So putting these two things together, the last two slides, 120 00:08:27,010 --> 00:08:29,940 it means that the expected value of the error term, 121 00:08:29,940 --> 00:08:32,970 given x always equals zero. 122 00:08:32,970 --> 00:08:35,350 And this is called the conditional mean assumption. 123 00:08:35,350 --> 00:08:38,240 Again, no matter what value of x. 124 00:08:38,240 --> 00:08:42,990 So no matter what the person says on the survey 125 00:08:42,990 --> 00:08:44,560 for their value of x, 126 00:08:44,560 --> 00:08:47,240 say, the expected value of the error term 127 00:08:47,240 --> 00:08:50,350 for that person will always be zero. 128 00:08:50,350 --> 00:08:54,947 And therefore the expected value of y given x 129 00:08:55,870 --> 00:08:58,250 is always beta naught plus beta 1x 130 00:08:59,120 --> 00:09:02,530 since the expected value of the error term is zero. 131 00:09:02,530 --> 00:09:06,130 And this actually sort of forms the regression line, 132 00:09:06,130 --> 00:09:08,330 which we'll see in a few minutes. 133 00:09:08,330 --> 00:09:10,020 And again, 134 00:09:10,020 --> 00:09:15,020 that a 1 unit change in x changes the expected value of y 135 00:09:15,930 --> 00:09:18,813 by beta 1 units. 136 00:09:19,720 --> 00:09:22,850 So what this means is, on average, 137 00:09:22,850 --> 00:09:26,950 for the average person or for the expected value, 138 00:09:26,950 --> 00:09:31,210 that if you change x by 1 unit, 139 00:09:35,290 --> 00:09:38,250 it will change y by exactly beta 1, 140 00:09:38,250 --> 00:09:42,390 but for the expected value, sort of, on average, 141 00:09:42,390 --> 00:09:45,542 if we were to forecast what the effect is, 142 00:09:45,542 --> 00:09:46,920 that's what it would be. 143 00:09:46,920 --> 00:09:50,720 And it doesn't matter if our x is large or small 144 00:09:50,720 --> 00:09:53,753 because it is a linear relationship. 145 00:09:57,230 --> 00:10:02,230 So the lack of correlation between u and X 146 00:10:04,840 --> 00:10:07,830 is extremely important in econometrics. 147 00:10:07,830 --> 00:10:10,210 That might be the single most important thing 148 00:10:13,740 --> 00:10:14,770 that we learn here. 149 00:10:14,770 --> 00:10:16,763 It's the fundamental assumption. 150 00:10:17,950 --> 00:10:18,820 And that is 151 00:10:22,290 --> 00:10:23,460 because, 152 00:10:23,460 --> 00:10:25,820 let's assume that that is not true. 153 00:10:25,820 --> 00:10:30,820 Let's assume that u changes as X changes 154 00:10:31,670 --> 00:10:36,670 and therefore du/dx, or delta u/delta x does not equal zero. 155 00:10:38,430 --> 00:10:41,950 What we wanna do in econometrics is, 156 00:10:41,950 --> 00:10:46,480 what is the change in the expected value, the y hat, 157 00:10:46,480 --> 00:10:48,210 as x changes? 158 00:10:48,210 --> 00:10:52,160 How does a change in the value of x change 159 00:10:52,160 --> 00:10:55,573 the predicted or the expected value of y? 160 00:10:57,970 --> 00:11:01,870 But we don't actually know the expected value of y. 161 00:11:01,870 --> 00:11:04,110 All we know is y. 162 00:11:04,110 --> 00:11:09,110 And so if we change x and we change y, 163 00:11:09,320 --> 00:11:10,290 so when, you know, 164 00:11:10,290 --> 00:11:14,053 by looking at different data points, 165 00:11:15,160 --> 00:11:20,160 the change in y that we see may be due to the change 166 00:11:20,940 --> 00:11:24,610 in the expected value, but it must also be, 167 00:11:24,610 --> 00:11:29,563 or may also be, as a result in the change in the error term. 168 00:11:30,690 --> 00:11:34,570 And we can't tell and we can't pull those things apart. 169 00:11:34,570 --> 00:11:38,150 We can't see, oh, this part is due to the expected value 170 00:11:38,150 --> 00:11:40,440 and this is due to the error term, 171 00:11:40,440 --> 00:11:44,840 because all that we observe is y. 172 00:11:44,840 --> 00:11:49,690 So we really wanna know beta, which is d y-hat/dx. 173 00:11:49,690 --> 00:11:54,690 But if du/dx does not equal zero, 174 00:11:55,610 --> 00:11:58,780 we cannot make that forecast. 175 00:11:58,780 --> 00:12:00,000 And this is 176 00:12:02,600 --> 00:12:05,310 a concept that we're gonna revisit a lot, 177 00:12:05,310 --> 00:12:06,973 but it's a very important one. 178 00:12:11,070 --> 00:12:13,490 And so the interpretation then 179 00:12:13,490 --> 00:12:18,490 is that beta 1 is the change in y for the average person. 180 00:12:18,940 --> 00:12:21,820 It's the forecasted, it's the expected, 181 00:12:21,820 --> 00:12:24,630 but it may not be true in every case. 182 00:12:24,630 --> 00:12:29,630 So if I, you know, if X is income and Y is expenditure, 183 00:12:31,100 --> 00:12:34,623 if you put an extra dollar in my pocket, 184 00:12:35,840 --> 00:12:39,770 I will spend beta 1 of that on the good 185 00:12:39,770 --> 00:12:42,890 that we are wondering about. 186 00:12:42,890 --> 00:12:44,840 Expected, that's the expected value. 187 00:12:44,840 --> 00:12:47,873 I may or may not spend exactly that much, 188 00:12:48,960 --> 00:12:53,503 but that is how we interpret this. 189 00:12:58,780 --> 00:13:03,780 So now that we know some of the uses of our estimates 190 00:13:05,380 --> 00:13:09,060 of beta, let's think about how we derive it. 191 00:13:09,060 --> 00:13:11,363 So suppose that we draw, 192 00:13:13,180 --> 00:13:15,310 we have our two question survey, 193 00:13:15,310 --> 00:13:19,463 we ask X, we have Y, and we have this linear model. 194 00:13:22,040 --> 00:13:27,040 So beta hat is the sum of, 195 00:13:27,130 --> 00:13:28,470 on the numerator, 196 00:13:28,470 --> 00:13:31,740 so look over here at this sort of in the box, 197 00:13:31,740 --> 00:13:34,280 the second equation, 198 00:13:34,280 --> 00:13:35,630 the sum of 199 00:13:37,070 --> 00:13:40,000 each x minus the mean of x 200 00:13:40,000 --> 00:13:45,000 times each y times the mean of y, all summed up. 201 00:13:45,360 --> 00:13:48,020 And in the denominator, 202 00:13:48,020 --> 00:13:52,983 each x minus the mean of x, squared. 203 00:13:54,510 --> 00:13:56,910 So another way of saying is 204 00:13:56,910 --> 00:14:01,810 it's the covariance of x and y 205 00:14:01,810 --> 00:14:04,333 divided by the variance of x. 206 00:14:05,960 --> 00:14:10,370 One of the ways that I liked to think of it is 207 00:14:11,556 --> 00:14:14,530 the mean slope. 208 00:14:14,530 --> 00:14:17,180 And this is just an intuition. 209 00:14:17,180 --> 00:14:20,500 Just one way that it helps me to think of it, 210 00:14:20,500 --> 00:14:22,150 that the slope of a line 211 00:14:22,150 --> 00:14:25,920 is the change in y 212 00:14:25,920 --> 00:14:27,580 over the change in x. 213 00:14:27,580 --> 00:14:31,310 So you can kind of in that second equation there, 214 00:14:31,310 --> 00:14:36,310 sort of divide through change of y over change in x. 215 00:14:37,690 --> 00:14:39,530 So it's sort of the average slope 216 00:14:39,530 --> 00:14:44,313 of how does x change as y changes. 217 00:14:45,540 --> 00:14:47,500 Another way of thinking about it, 218 00:14:47,500 --> 00:14:51,913 the covariance of x and y, 219 00:14:53,230 --> 00:14:54,763 the intuition is, 220 00:14:55,670 --> 00:14:59,810 to what extent do these two variables 221 00:14:59,810 --> 00:15:04,810 sort of march in lockstep or not? 222 00:15:04,870 --> 00:15:08,280 If one gets bigger, does one get smaller, 223 00:15:08,280 --> 00:15:09,500 or the other way around? 224 00:15:09,500 --> 00:15:12,140 Whether they both get bigger or both gets smaller, 225 00:15:12,140 --> 00:15:14,520 or do they have no relationship because 226 00:15:14,520 --> 00:15:17,293 of the strong or weak, et cetera? 227 00:15:18,520 --> 00:15:21,180 And then sort of divided by, 228 00:15:21,180 --> 00:15:25,020 or normalized by the overall variation in x. 229 00:15:27,490 --> 00:15:32,490 So how do these two things change together, 230 00:15:32,790 --> 00:15:37,630 normalized by how does x itself change? 231 00:15:37,630 --> 00:15:41,120 So that's sort of a few ways of thinking about 232 00:15:42,310 --> 00:15:47,310 both the intuition and the math of the OLS estimator, 233 00:15:47,750 --> 00:15:48,853 this beta hat. 234 00:15:53,790 --> 00:15:57,250 So what the OLS estimator does is, 235 00:15:57,250 --> 00:16:01,720 it minimizes the sum of squared residuals. 236 00:16:01,720 --> 00:16:02,810 So it takes 237 00:16:06,490 --> 00:16:10,060 everybody's observation and it chooses beta 238 00:16:10,060 --> 00:16:15,060 so that the distance between their actual answer, 239 00:16:17,880 --> 00:16:20,010 what they actually said, y, 240 00:16:20,010 --> 00:16:22,923 and their predicted value squared, 241 00:16:23,800 --> 00:16:26,680 is as small as possible. 242 00:16:26,680 --> 00:16:30,610 That it minimizes the square of the distance 243 00:16:30,610 --> 00:16:33,550 between y and y hat. 244 00:16:33,550 --> 00:16:35,080 Another way of thinking about it, 245 00:16:35,080 --> 00:16:40,080 or, specifically, it minimizes the sum of the u i squares. 246 00:16:43,040 --> 00:16:46,583 you take everybody's u i, square it, 247 00:16:48,010 --> 00:16:51,310 add them up and 248 00:16:53,240 --> 00:16:58,170 beta OLS makes that number as small as it can. 249 00:16:58,170 --> 00:17:01,630 So it's just like an optimization problem 250 00:17:01,630 --> 00:17:06,570 that you might have had in microeconomics. 251 00:17:06,570 --> 00:17:10,700 You're choosing the value of beta 252 00:17:10,700 --> 00:17:15,700 so that the value of u i squared is as small as possible. 253 00:17:17,150 --> 00:17:18,713 So you're minimizing it. 254 00:17:20,910 --> 00:17:25,287 And again, where y hat is the estimated value of y 255 00:17:26,510 --> 00:17:29,063 and y i is the actual value. 256 00:17:34,930 --> 00:17:38,853 So once you know beta, 257 00:17:40,800 --> 00:17:43,480 your beta naught and beta 1, 258 00:17:43,480 --> 00:17:46,830 you can draw a regression line. 259 00:17:46,830 --> 00:17:50,920 So you take the actual value of x 260 00:17:50,920 --> 00:17:54,350 and plug it into that equation and get y hat. 261 00:17:54,350 --> 00:17:59,350 So you take every individual on the survey, your answer, 262 00:17:59,550 --> 00:18:02,170 and this person and this person and this person, 263 00:18:02,170 --> 00:18:07,170 put in their x, and then every individual then has a y hat. 264 00:18:07,520 --> 00:18:10,290 And if you graph that and it makes a line, 265 00:18:10,290 --> 00:18:12,663 that is the regression line. 266 00:18:13,740 --> 00:18:16,720 And another way of thinking about it is, 267 00:18:16,720 --> 00:18:21,720 you sort of have this graph, 268 00:18:22,200 --> 00:18:24,160 you look at the value of x, 269 00:18:24,160 --> 00:18:26,350 you go up to the regression line, 270 00:18:26,350 --> 00:18:31,140 you go over and see what is the y-value on that line, 271 00:18:31,140 --> 00:18:33,040 and that is y hat. 272 00:18:33,040 --> 00:18:37,300 And again, in OLS, 273 00:18:37,300 --> 00:18:40,570 the program chooses the beta hat, 274 00:18:40,570 --> 00:18:43,850 so beta 1 hat and beta naught hat, 275 00:18:43,850 --> 00:18:48,833 to minimize the sum of squared residuals. 276 00:18:54,450 --> 00:18:59,450 So if you're into matrix algebra, if you're nerdy, 277 00:18:59,570 --> 00:19:03,490 I'm not quite this nerdy, 278 00:19:03,490 --> 00:19:06,803 although I guess I am a bit nerdy, 279 00:19:10,078 --> 00:19:13,210 here is the optimization problem. 280 00:19:13,210 --> 00:19:14,043 And you can see 281 00:19:14,043 --> 00:19:18,500 that beta equals x prime x inverse x prime y. 282 00:19:18,500 --> 00:19:22,867 And do you see that it sort of looks like that formula 283 00:19:24,560 --> 00:19:25,490 that I showed you, 284 00:19:25,490 --> 00:19:30,490 that mean slope or the covariance of x and y divided by 285 00:19:30,930 --> 00:19:32,650 the variance of x? 286 00:19:32,650 --> 00:19:36,113 And if you wanna know more, here's a proof. 287 00:19:37,690 --> 00:19:42,690 So basically you minimize by choosing beta u prime u, 288 00:19:45,720 --> 00:19:48,963 or the sum of squared residuals. 289 00:19:54,000 --> 00:19:57,860 And again, this is the sample regression line. 290 00:19:57,860 --> 00:20:02,050 So for every individual, 291 00:20:02,050 --> 00:20:07,050 you can input an x and calculate what their y hat is, 292 00:20:07,330 --> 00:20:11,250 or even for any value of x, 293 00:20:11,250 --> 00:20:16,250 even if someone didn't actually answer that on a survey, 294 00:20:16,350 --> 00:20:20,980 you can say, well, if someone had given that value of x, 295 00:20:20,980 --> 00:20:24,197 what would we predict they would have said for y? 296 00:20:24,197 --> 00:20:26,520 And that would be their y hat 297 00:20:26,520 --> 00:20:29,260 and you can use this formula. 298 00:20:37,940 --> 00:20:42,940 The reason we spend so much time on OLS is to think back 299 00:20:43,180 --> 00:20:44,740 of the criteria 300 00:20:47,160 --> 00:20:48,620 for estimators, 301 00:20:48,620 --> 00:20:53,620 that when a number of assumptions hold, OLS is blue, 302 00:20:53,980 --> 00:20:57,580 it's the best linear unbiased estimator. 303 00:20:57,580 --> 00:21:00,510 So "best" means the most efficient, 304 00:21:00,510 --> 00:21:03,210 the lowest variance. 305 00:21:03,210 --> 00:21:05,910 Unbiased, we learned about estimate, 306 00:21:05,910 --> 00:21:10,570 and that this is a linear function. 307 00:21:10,570 --> 00:21:15,450 So we're gonna see these five assumptions 308 00:21:15,450 --> 00:21:17,030 in the next few slides. 309 00:21:17,030 --> 00:21:18,640 And when these hold true, 310 00:21:18,640 --> 00:21:23,550 that the first four are needed 311 00:21:23,550 --> 00:21:26,810 for the beta to be unbiased, 312 00:21:26,810 --> 00:21:30,963 and if the fifth one holds, it's also the most efficient. 313 00:21:35,760 --> 00:21:38,090 So here are the five assumptions, 314 00:21:38,090 --> 00:21:39,780 and I'm gonna go through each one 315 00:21:39,780 --> 00:21:42,610 in each of the next few slides. 316 00:21:42,610 --> 00:21:43,500 But here they are. 317 00:21:43,500 --> 00:21:48,370 First, it's that it is linear in parameters. 318 00:21:48,370 --> 00:21:51,210 So this is a linear model, 319 00:21:51,210 --> 00:21:54,130 that you can model the phenomenon 320 00:21:54,130 --> 00:21:59,090 that you're interested in by a linear equation; 321 00:21:59,090 --> 00:22:01,910 Two is it's random sampling. 322 00:22:01,910 --> 00:22:06,290 So the way that you draw your sample is random. 323 00:22:06,290 --> 00:22:11,140 So you're not sort of privileging one group over another, 324 00:22:11,140 --> 00:22:16,070 or just sort of taking whoever comes along 325 00:22:16,070 --> 00:22:21,070 or only taking the survey of older people, 326 00:22:21,300 --> 00:22:25,260 or lower income or only men or anything like that. 327 00:22:25,260 --> 00:22:30,260 That it's a random sample that resembles the population. 328 00:22:31,530 --> 00:22:36,530 Two is that X is a non-stochastic constant. 329 00:22:37,399 --> 00:22:40,350 So your X's do not change over time, 330 00:22:40,350 --> 00:22:45,130 or at least during the timeframe of the sampling. 331 00:22:45,130 --> 00:22:50,130 So if you go and take another sample, 332 00:22:50,840 --> 00:22:53,410 if you ask the same person again, 333 00:22:53,410 --> 00:22:55,653 that their X's will be the same. 334 00:22:56,680 --> 00:22:58,560 And another thing is, 335 00:22:58,560 --> 00:23:01,953 not everyone says the same thing for X, 336 00:23:01,953 --> 00:23:04,837 that X has to have some variation. 337 00:23:09,570 --> 00:23:13,350 Because if X is income and Y is expenditure, 338 00:23:13,350 --> 00:23:17,440 if everybody has the exact same X, 339 00:23:17,440 --> 00:23:19,700 how do you know how Y changes? 340 00:23:19,700 --> 00:23:21,110 How would you think about, 341 00:23:21,110 --> 00:23:25,940 like graphing it, how would you know the slope of a line 342 00:23:25,940 --> 00:23:28,450 if everybody says the exact same X, 343 00:23:28,450 --> 00:23:31,343 as you need to have something with variation, X 2. 344 00:23:32,350 --> 00:23:36,230 The fourth one is this conditional mean, 345 00:23:36,230 --> 00:23:39,850 that the expected value of everybody's error term 346 00:23:39,850 --> 00:23:43,360 given their X equals zero for every individual. 347 00:23:43,360 --> 00:23:48,060 So X has no effect on the expected value of the error term. 348 00:23:48,060 --> 00:23:53,050 And finally, the fifth one is needed 349 00:23:53,050 --> 00:23:56,535 for it to be the most efficient estimator. 350 00:23:56,535 --> 00:24:01,535 And that is the variant of the error terms is a constant 351 00:24:01,890 --> 00:24:06,890 and the covariance of your error term and my error term, 352 00:24:07,040 --> 00:24:10,380 the error term of two individuals are zero. 353 00:24:10,380 --> 00:24:15,380 So your error term has nothing to do with my error term. 354 00:24:16,780 --> 00:24:19,520 So thinking of our bingo ball example, 355 00:24:19,520 --> 00:24:22,540 the bingo ball that you choose has no effect 356 00:24:22,540 --> 00:24:24,833 on what bingo ball that I choose. 357 00:24:26,060 --> 00:24:31,060 So we call this heteroskedasticity and uncorrelated errors. 358 00:24:31,990 --> 00:24:35,040 And another way of thinking about this, 359 00:24:35,040 --> 00:24:38,490 and I'll do it more when we cover this in depth, 360 00:24:38,490 --> 00:24:43,490 is that the variance-covariance matrix is sigma square I, 361 00:24:45,010 --> 00:24:47,500 where I is the identity matrix. 362 00:24:47,500 --> 00:24:51,170 So thinking about an N by N matrix, 363 00:24:51,170 --> 00:24:56,170 the diagonal going from top left to bottom right, 364 00:24:57,210 --> 00:25:02,210 everything on that diagonal is this constant, sigma squared, 365 00:25:02,400 --> 00:25:05,973 and every other value is zero. 366 00:25:10,370 --> 00:25:12,480 Let's talk in depth a bit more. 367 00:25:12,480 --> 00:25:15,310 So our first assumption is that the dependent variable 368 00:25:15,310 --> 00:25:19,520 can be calculated as a linear function 369 00:25:19,520 --> 00:25:22,323 of the independent variable plus the disturbance, 370 00:25:26,170 --> 00:25:30,980 and the violations are thought of as specification errors. 371 00:25:30,980 --> 00:25:34,143 First, if you choose the wrong regressors; 372 00:25:35,220 --> 00:25:38,960 Second, it's not actually a linear function. 373 00:25:38,960 --> 00:25:43,410 It's not linear in the parameters, 374 00:25:43,410 --> 00:25:48,410 or if the parameters change over time. 375 00:25:49,400 --> 00:25:51,633 So the slope of the line, 376 00:25:54,025 --> 00:25:58,410 you know, gets greater or lesser over time. 377 00:25:59,420 --> 00:26:04,250 So we assume that these parameters are fixed 378 00:26:04,250 --> 00:26:09,250 in the population for any given sampling period. 379 00:26:13,900 --> 00:26:17,870 Next is the assumption of random sampling. 380 00:26:17,870 --> 00:26:21,507 So one way of thinking about it is, 381 00:26:21,507 --> 00:26:25,010 and the most common way is, 382 00:26:25,010 --> 00:26:29,320 every individual in the population has an equal chance 383 00:26:29,320 --> 00:26:33,140 of being selected into the sample. 384 00:26:33,140 --> 00:26:38,040 We're not oversampling any group, again, by gender, by race, 385 00:26:38,040 --> 00:26:41,473 by income, by where they live, or anything like that. 386 00:26:44,762 --> 00:26:48,730 Basically, that you end up with a sample 387 00:26:48,730 --> 00:26:53,730 that is representative of the population. 388 00:27:00,920 --> 00:27:02,210 The third one, again, 389 00:27:02,210 --> 00:27:05,670 is that the X is a non-stochastic constant. 390 00:27:05,670 --> 00:27:10,510 So we have a random dataset and observations of Y and X. 391 00:27:10,510 --> 00:27:14,313 The X's do not change over the sampling period. 392 00:27:19,210 --> 00:27:24,210 Again, it's also important that not every individual 393 00:27:24,470 --> 00:27:27,103 has the same value for X. 394 00:27:29,630 --> 00:27:34,580 And these observations are fixed in repeated samples. 395 00:27:35,420 --> 00:27:38,830 So if you drew the same sample again, 396 00:27:38,830 --> 00:27:41,413 the X's would have the same value. 397 00:27:44,650 --> 00:27:48,530 If you were in the sample 398 00:27:48,530 --> 00:27:52,810 in one round and then we did it again in another round, 399 00:27:52,810 --> 00:27:55,730 all of your X's would be the same. 400 00:27:55,730 --> 00:27:58,300 Your Y's would not be the same 401 00:27:58,300 --> 00:28:01,100 because there'll be the error term, 402 00:28:01,100 --> 00:28:06,100 which sort of makes them go up and down again. 403 00:28:06,530 --> 00:28:09,180 Again, it depends on sort of which bingo ball 404 00:28:09,180 --> 00:28:10,343 that you choose. 405 00:28:11,800 --> 00:28:15,940 The three main violations of this are errors 406 00:28:15,940 --> 00:28:17,750 in the variables. 407 00:28:17,750 --> 00:28:21,330 So if you make a mistake in how you measure X. 408 00:28:21,330 --> 00:28:23,360 And we're gonna talk near the end of class 409 00:28:23,360 --> 00:28:25,690 of what we can do about that. 410 00:28:25,690 --> 00:28:28,193 If we have what's called autoregression. 411 00:28:29,470 --> 00:28:34,470 So if last year Y is a regressor in this year's Y, 412 00:28:34,680 --> 00:28:36,970 then that also falls apart. 413 00:28:36,970 --> 00:28:39,900 And simultaneous equation, 414 00:28:39,900 --> 00:28:43,700 where we actually have two dependents 415 00:28:43,700 --> 00:28:46,370 that sort of mutually effect each other. 416 00:28:46,370 --> 00:28:49,020 So in the model that we've been seeing, 417 00:28:49,020 --> 00:28:53,980 we can sort of think of it as a one-way causality. 418 00:28:53,980 --> 00:28:57,940 That a change in X drives a change in Y. 419 00:28:57,940 --> 00:29:01,380 Whereas with simultaneous equations, 420 00:29:01,380 --> 00:29:04,900 that they're sort of simultaneously driving each other. 421 00:29:04,900 --> 00:29:07,300 It's sort of an arrow with 422 00:29:10,267 --> 00:29:13,860 two arrow heads on each side. 423 00:29:13,860 --> 00:29:15,033 Whereas, 424 00:29:16,950 --> 00:29:20,270 in this model it's just X 425 00:29:20,270 --> 00:29:23,933 and an arrow pointing to Y, if that makes sense. 426 00:29:31,530 --> 00:29:32,363 Finally, 427 00:29:33,630 --> 00:29:35,440 and again a really important assumption, 428 00:29:35,440 --> 00:29:39,190 is that the expected value of the disturbance term is zero 429 00:29:39,190 --> 00:29:44,190 and that the value of X has no effect 430 00:29:44,600 --> 00:29:47,520 on the expected value of the error term. 431 00:29:47,520 --> 00:29:49,810 That the error term that you draw 432 00:29:49,810 --> 00:29:53,180 is sort of truly a random occurrence 433 00:29:53,180 --> 00:29:56,380 not dependent on what your X is. 434 00:29:59,110 --> 00:30:01,980 So high-income folks or low-income folks 435 00:30:01,980 --> 00:30:03,910 or medium-income folks, 436 00:30:03,910 --> 00:30:08,310 their income has no effect on what error term 437 00:30:08,310 --> 00:30:09,310 that they would get. 438 00:30:13,540 --> 00:30:17,410 And the mean of it must be zero. 439 00:30:17,410 --> 00:30:21,043 And if that's not true, it leads to biased intercept. 440 00:30:21,970 --> 00:30:26,970 So the big point here is that for every individual i, 441 00:30:28,370 --> 00:30:32,013 the u i and the x i are uncorrelated. 442 00:30:37,620 --> 00:30:42,510 If these four things hold, then the beta OLS, 443 00:30:42,510 --> 00:30:46,640 the OLS estimator of a beta, is unbiased, 444 00:30:46,640 --> 00:30:51,193 and that OLS will yield an unbiased estimator. 445 00:30:54,900 --> 00:30:59,900 So those first four are needed for beta OLS to be unbiased. 446 00:31:02,100 --> 00:31:04,560 And if this fifth assumption holds, 447 00:31:04,560 --> 00:31:09,220 then it's also the most efficient estimator. 448 00:31:09,220 --> 00:31:12,063 So the one with the lowest variance, 449 00:31:12,063 --> 00:31:13,730 so that the one that, 450 00:31:13,730 --> 00:31:17,430 if you can imagine the bell curve 451 00:31:17,430 --> 00:31:20,620 where it's sort of the skinniest bell curve, 452 00:31:20,620 --> 00:31:25,020 it's most sort of bunched up around the mean. 453 00:31:25,020 --> 00:31:27,790 A tall, skinny bell curve would be 454 00:31:27,790 --> 00:31:31,270 a more efficient estimator. 455 00:31:31,270 --> 00:31:33,220 A short, fat bell curve 456 00:31:33,220 --> 00:31:36,683 would be the less efficient estimator. 457 00:31:38,280 --> 00:31:43,280 And we assume that there are so-called spherical errors 458 00:31:43,480 --> 00:31:46,130 and that all the disturbances, 459 00:31:46,130 --> 00:31:48,070 they have the same variance 460 00:31:48,070 --> 00:31:51,610 and are not correlated with each other. 461 00:31:51,610 --> 00:31:52,443 So 462 00:31:57,300 --> 00:32:00,360 when they have equal variance, 463 00:32:00,360 --> 00:32:01,993 it's called homoskedasticity 464 00:32:01,993 --> 00:32:05,450 so that the value of x has no effect 465 00:32:05,450 --> 00:32:10,083 on what the variance of the error term is. 466 00:32:12,630 --> 00:32:14,850 When we have heteroscedasticity, 467 00:32:14,850 --> 00:32:19,250 it means that the variance changes as X changes. 468 00:32:19,250 --> 00:32:20,083 And again, 469 00:32:20,083 --> 00:32:22,380 we're gonna spend a whole week on this in, 470 00:32:22,380 --> 00:32:23,803 like, four or five weeks. 471 00:32:24,960 --> 00:32:29,960 There's also autocorrelation. This is also a violation. 472 00:32:32,130 --> 00:32:37,120 And that is where the disturbances are correlated. 473 00:32:38,150 --> 00:32:43,150 So the bingo ball that you draw will then affect 474 00:32:43,570 --> 00:32:45,850 which bingo ball that I draw. 475 00:32:45,850 --> 00:32:48,017 That is autocorrelation. 476 00:32:50,970 --> 00:32:54,110 And those are the violation, again, 477 00:32:54,110 --> 00:32:56,683 where we're gonna spend more time on this. 478 00:32:58,350 --> 00:33:01,680 But if we have these spherical errors, 479 00:33:01,680 --> 00:33:06,270 so if we have homoskedasticity and uncorrelated errors, 480 00:33:08,310 --> 00:33:13,023 then this is the most efficient estimator as well. 481 00:33:14,950 --> 00:33:16,380 The way we express it is, 482 00:33:16,380 --> 00:33:18,890 if 1 through 5 hold, OLS is blue, 483 00:33:18,890 --> 00:33:22,510 it's the best linear unbiased estimator. 484 00:33:22,510 --> 00:33:24,040 And again, 485 00:33:24,040 --> 00:33:27,280 if this spherical error assumption holds, 486 00:33:27,280 --> 00:33:30,163 it means it's the most efficient estimator. 487 00:33:36,800 --> 00:33:40,523 There are a few other conditions that have to hold. 488 00:33:41,960 --> 00:33:46,830 First, that the number of observations, so your n, 489 00:33:46,830 --> 00:33:51,760 has to be greater than the number of your regressors, k. 490 00:33:51,760 --> 00:33:54,440 So the degree of freedom 491 00:33:56,550 --> 00:33:58,650 is the number of observations 492 00:33:58,650 --> 00:34:00,563 minus the number of regressors. 493 00:34:03,410 --> 00:34:07,650 And that is measured by the degrees of freedom. 494 00:34:07,650 --> 00:34:10,163 It must always be greater than 1. 495 00:34:15,013 --> 00:34:19,140 And more degrees of freedom is better, 496 00:34:21,220 --> 00:34:24,080 it gives a better model. 497 00:34:24,080 --> 00:34:28,220 Also, we're gonna learn more about this too. 498 00:34:28,220 --> 00:34:33,220 There can't be an exact linear relationship among the X's. 499 00:34:35,370 --> 00:34:38,847 So if your first X is income, 500 00:34:44,070 --> 00:34:47,240 and another X is 501 00:34:49,000 --> 00:34:52,050 5 times your income plus 1, 502 00:34:52,050 --> 00:34:55,060 they're a perfect linear combination of each other. 503 00:34:55,060 --> 00:34:58,770 And if you know one, you sort of know the other. 504 00:34:58,770 --> 00:35:02,790 So each regressor has to add some new information 505 00:35:02,790 --> 00:35:06,120 and not just be a simple function of each other. 506 00:35:06,120 --> 00:35:10,120 So we're gonna learn a bit more about that. 507 00:35:10,120 --> 00:35:12,460 A good example for that 508 00:35:12,460 --> 00:35:16,270 is degrees Fahrenheit and degrees Celsius. 509 00:35:16,270 --> 00:35:20,130 So they are linear combinations of each other 510 00:35:20,130 --> 00:35:23,530 and you can include both of them in a model, 511 00:35:23,530 --> 00:35:26,010 say, of how fast a plant grows or, 512 00:35:26,010 --> 00:35:29,653 you know, how thickly the ice forms or anything like that. 513 00:35:35,910 --> 00:35:40,910 So OLS kind of has a special place in the econometrics. 514 00:35:43,540 --> 00:35:44,710 It's the 515 00:35:47,950 --> 00:35:50,300 simplest estimator in some ways, 516 00:35:50,300 --> 00:35:53,793 but it also scores high in a lot of other ones. 517 00:35:54,720 --> 00:35:58,860 So think back, computational cost. 518 00:35:58,860 --> 00:36:03,330 I think any econometrics software package is gonna have OLS. 519 00:36:03,330 --> 00:36:05,380 So certainly SPSS does and R 520 00:36:05,380 --> 00:36:08,120 and all the other well-known ones. 521 00:36:08,120 --> 00:36:12,380 It minimizes least squares, basically by definition. 522 00:36:13,660 --> 00:36:17,110 It also maximizes R squared for much the same reason, 523 00:36:17,110 --> 00:36:18,573 and it is unbiased. 524 00:36:22,410 --> 00:36:24,080 And if all of these assumptions hold, 525 00:36:24,080 --> 00:36:26,250 it's the best unbiased. 526 00:36:26,250 --> 00:36:31,250 That it will have the smallest variance-covariance matrix. 527 00:36:31,470 --> 00:36:36,040 And if the errors are normally distributed, 528 00:36:36,040 --> 00:36:41,010 it's not only BLUE, it's BUE, it's the best unbiased. 529 00:36:41,010 --> 00:36:44,197 It may not minimize mean square error, 530 00:36:47,270 --> 00:36:50,210 that you could have a biased estimator 531 00:36:50,210 --> 00:36:52,450 with a really small variance, 532 00:36:52,450 --> 00:36:56,893 which then makes a smaller mean square error. 533 00:36:58,580 --> 00:37:03,580 And last, if the disturbances are normally distributed, 534 00:37:03,660 --> 00:37:05,410 and we're gonna talk a lot about that 535 00:37:05,410 --> 00:37:07,660 when we talk about hypothesis tests. 536 00:37:07,660 --> 00:37:09,270 But if that holds true, 537 00:37:09,270 --> 00:37:14,270 then OLS and maximum likelihood estimation 538 00:37:14,500 --> 00:37:15,943 are exactly the same. 539 00:37:19,230 --> 00:37:20,910 So this is what we did. 540 00:37:20,910 --> 00:37:24,850 We looked at the single regressor example. 541 00:37:24,850 --> 00:37:28,950 We talked about the betas, the disturbances, 542 00:37:28,950 --> 00:37:33,110 how we get the OLS estimator and the five assumptions. 543 00:37:33,110 --> 00:37:33,943 And again, 544 00:37:33,943 --> 00:37:37,053 if these five things hold, then OLS is blue. 545 00:37:37,920 --> 00:37:42,920 So I hope that you found this helpful as a review guide. 546 00:37:43,270 --> 00:37:45,023 And thanks.