1
00:00:02,040 --> 00:00:02,873
- [Instructor] Hello

2
00:00:02,873 --> 00:00:07,873
and welcome to the lecture on
classical linear regression.

3
00:00:08,190 --> 00:00:12,930
So we're going to look at
the case where we have 1x

4
00:00:14,740 --> 00:00:17,083
in the linear model.

5
00:00:18,610 --> 00:00:22,603
It's the simplest case, and
we'll build on it as we go.

6
00:00:24,760 --> 00:00:26,490
So this is what we're gonna do:

7
00:00:26,490 --> 00:00:31,460
we're gonna look at the
single regressor example.

8
00:00:31,460 --> 00:00:34,420
Again, there's really
not going to be a lot

9
00:00:34,420 --> 00:00:37,690
of real life examples
where you can only have 1x,

10
00:00:37,690 --> 00:00:39,770
but by understanding the properties

11
00:00:39,770 --> 00:00:41,373
of this most simple model,

12
00:00:43,060 --> 00:00:46,620
provides the building blocks
that we can use to build

13
00:00:46,620 --> 00:00:50,480
and understand as we add more regressors.

14
00:00:50,480 --> 00:00:51,450
So we'll start with that.

15
00:00:51,450 --> 00:00:53,150
We're gonna look at the betas,

16
00:00:53,150 --> 00:00:55,820
at the disturbances for the error term,

17
00:00:55,820 --> 00:00:57,640
look at the OLS estimator.

18
00:00:57,640 --> 00:01:00,570
So how does ordinarily squares

19
00:01:04,660 --> 00:01:08,280
derive the value of the
beta based on the data

20
00:01:08,280 --> 00:01:09,420
that we have?

21
00:01:09,420 --> 00:01:12,420
And the five assumptions,

22
00:01:12,420 --> 00:01:15,500
which lead us to the

23
00:01:18,134 --> 00:01:20,700
OLS estimator being

24
00:01:23,930 --> 00:01:26,870
the best unbiased
estimator that we can get

25
00:01:26,870 --> 00:01:28,683
when these five things hold.

26
00:01:31,550 --> 00:01:32,690
So here is

27
00:01:34,380 --> 00:01:35,690
problem set 2.

28
00:01:35,690 --> 00:01:39,570
So what's the relationship
between beta naught

29
00:01:39,570 --> 00:01:42,433
and the expected value
of the disturbance term?

30
00:01:44,960 --> 00:01:47,170
I want you to think about

31
00:01:49,520 --> 00:01:53,800
the single regressor example,

32
00:01:53,800 --> 00:01:57,350
and what does beta 1 hat measure?

33
00:01:57,350 --> 00:01:58,750
What does it mean?

34
00:01:58,750 --> 00:02:03,360
And give an example as needed;

35
00:02:03,360 --> 00:02:08,360
3: Describe the intuition of
how beta 1 hat is derived.

36
00:02:10,620 --> 00:02:14,590
3: What is the major
implication of the linearity

37
00:02:14,590 --> 00:02:15,740
of this model?

38
00:02:15,740 --> 00:02:18,570
And last, what are the five assumptions?

39
00:02:18,570 --> 00:02:21,560
What is the major implication
if they will hold?

40
00:02:21,560 --> 00:02:26,393
And what is an example
of if they do not hold,

41
00:02:27,370 --> 00:02:29,913
well, what's an example of a violation?

42
00:02:31,640 --> 00:02:36,350
And then I would also like you
to do two computer problems

43
00:02:36,350 --> 00:02:40,220
in SPSS, C2.1 and C2.2

44
00:02:41,920 --> 00:02:43,630
in the Wooldridge text.

45
00:02:43,630 --> 00:02:48,163
I will provide the data and the questions.

46
00:02:51,290 --> 00:02:56,250
So our simplest regression
is when k equals 1,

47
00:02:56,250 --> 00:02:58,330
when we have one regressor,

48
00:02:58,330 --> 00:03:00,320
and you can see, here's our model,

49
00:03:00,320 --> 00:03:04,020
y equals beta nought
plus beta 1 x 1 plus u.

50
00:03:04,020 --> 00:03:08,103
So y is known as the dependent variable,

51
00:03:09,640 --> 00:03:13,030
most commonly, that's
what we're gonna call it.

52
00:03:13,030 --> 00:03:15,760
You may also see the
explain, the response,

53
00:03:15,760 --> 00:03:19,840
the predicted, et cetera,
where X is the independent,

54
00:03:19,840 --> 00:03:23,210
or sometimes called the regressor,

55
00:03:23,210 --> 00:03:26,803
and then u is the error
term or the disturbance.

56
00:03:31,300 --> 00:03:33,980
So this model has two betas.

57
00:03:33,980 --> 00:03:37,510
So beta naught is the
intercept or the constant term.

58
00:03:37,510 --> 00:03:41,940
So you can think about
it as if x equals zero,

59
00:03:41,940 --> 00:03:44,060
what's the value of Y?

60
00:03:44,060 --> 00:03:46,340
Sometimes this would have

61
00:03:48,200 --> 00:03:52,830
sort of a sensible
interpretation, sometimes not.

62
00:03:52,830 --> 00:03:54,710
It really depends on

63
00:03:56,820 --> 00:03:59,580
the nature of the model.

64
00:03:59,580 --> 00:04:03,830
And beta 1 is the slope parameter.

65
00:04:03,830 --> 00:04:08,830
Usually, this is the figure
that we're most interested in

66
00:04:10,610 --> 00:04:11,740
in econometrics.

67
00:04:11,740 --> 00:04:15,360
This is the primary relationship.

68
00:04:15,360 --> 00:04:18,020
And that is because we know

69
00:04:20,380 --> 00:04:23,340
that y will change

70
00:04:25,200 --> 00:04:29,100
by beta units

71
00:04:29,100 --> 00:04:31,273
if we change x by 1.

72
00:04:32,530 --> 00:04:36,010
And as long as our error
term does not change,

73
00:04:36,010 --> 00:04:41,010
that we can express it as
delta y equals beta 1 delta x.

74
00:04:42,960 --> 00:04:47,233
So if x would change by five units,

75
00:04:48,680 --> 00:04:53,680
the predicted value of y will
change by five times beta.

76
00:04:54,820 --> 00:04:59,670
And an important thing about this is that

77
00:04:59,670 --> 00:05:04,670
this linear assumption assumes
that no matter the value

78
00:05:06,140 --> 00:05:10,020
of x, so if you have a big x or a small x,

79
00:05:10,020 --> 00:05:13,690
you change x by a little or a lot,

80
00:05:13,690 --> 00:05:15,990
it doesn't matter where you start.

81
00:05:15,990 --> 00:05:19,750
It doesn't matter if x is big or small.

82
00:05:19,750 --> 00:05:23,773
The slope of the line doesn't change that.

83
00:05:25,680 --> 00:05:30,510
One unit change in x will
have the same effect on y,

84
00:05:30,510 --> 00:05:32,383
no matter where you start.

85
00:05:39,010 --> 00:05:39,843
So

86
00:05:43,410 --> 00:05:48,410
one of the fundamental assumptions

87
00:05:49,880 --> 00:05:54,880
is that the expected value
of our error term is zero.

88
00:05:56,210 --> 00:06:00,380
So we assume that it
will have a mean of zero,

89
00:06:00,380 --> 00:06:04,200
that if we draw over
and over and over again,

90
00:06:04,200 --> 00:06:09,200
that the value of the mean
is going to converge to zero.

91
00:06:11,270 --> 00:06:16,270
And mathematically, this is
pretty easy to make happen,

92
00:06:18,480 --> 00:06:23,480
that we can sort of slide
the intercept up and down

93
00:06:24,850 --> 00:06:27,600
to make sure that this holds true.

94
00:06:27,600 --> 00:06:32,600
So in a sense, we sort of
normalize everything else,

95
00:06:32,840 --> 00:06:37,840
especially the beta naught
or the box on the right here,

96
00:06:37,840 --> 00:06:40,943
the alpha term, so that it's true.

97
00:06:44,050 --> 00:06:47,593
It's a very fundamental
part of econometrics,

98
00:06:48,878 --> 00:06:52,110
that the expected value of the error term

99
00:06:52,110 --> 00:06:53,613
is always zero.

100
00:07:00,740 --> 00:07:04,520
Not only is the expected value

101
00:07:04,520 --> 00:07:07,573
of the error term always zero.

102
00:07:08,860 --> 00:07:13,620
The error term is always
uncorrelated with the x,

103
00:07:13,620 --> 00:07:17,370
and this will be true in
this model when we have one x

104
00:07:17,370 --> 00:07:20,143
or as many x's as we have,

105
00:07:20,990 --> 00:07:25,650
the error term is
uncorrelated with every x.

106
00:07:27,720 --> 00:07:30,100
Another way of saying that is

107
00:07:30,100 --> 00:07:34,110
the error term is mean independent of x.

108
00:07:34,110 --> 00:07:38,653
That no matter what value
of x that you may have,

109
00:07:39,570 --> 00:07:44,570
so no matter what x someone
might answer on a survey,

110
00:07:44,640 --> 00:07:49,230
the expected value of the
error term is always the same.

111
00:07:49,230 --> 00:07:53,250
That knowing the value
of x does not tell you

112
00:07:53,250 --> 00:07:56,930
any information about the
value of the error term,

113
00:07:56,930 --> 00:08:00,190
that they are independent mathematically,

114
00:08:00,190 --> 00:08:03,580
or that you could also say
that they are orthogonal,

115
00:08:03,580 --> 00:08:07,460
that they sort of don't have
anything to do with each other.

116
00:08:07,460 --> 00:08:09,443
They have no correlation.

117
00:08:10,410 --> 00:08:13,980
So knowing something about the
x does not tell you anything

118
00:08:13,980 --> 00:08:15,533
about the error term.

119
00:08:20,710 --> 00:08:25,710
So putting these two things
together, the last two slides,

120
00:08:27,010 --> 00:08:29,940
it means that the expected
value of the error term,

121
00:08:29,940 --> 00:08:32,970
given x always equals zero.

122
00:08:32,970 --> 00:08:35,350
And this is called the
conditional mean assumption.

123
00:08:35,350 --> 00:08:38,240
Again, no matter what value of x.

124
00:08:38,240 --> 00:08:42,990
So no matter what the
person says on the survey

125
00:08:42,990 --> 00:08:44,560
for their value of x,

126
00:08:44,560 --> 00:08:47,240
say, the expected value of the error term

127
00:08:47,240 --> 00:08:50,350
for that person will always be zero.

128
00:08:50,350 --> 00:08:54,947
And therefore the expected
value of y given x

129
00:08:55,870 --> 00:08:58,250
is always beta naught plus beta 1x

130
00:08:59,120 --> 00:09:02,530
since the expected value
of the error term is zero.

131
00:09:02,530 --> 00:09:06,130
And this actually sort of
forms the regression line,

132
00:09:06,130 --> 00:09:08,330
which we'll see in a few minutes.

133
00:09:08,330 --> 00:09:10,020
And again,

134
00:09:10,020 --> 00:09:15,020
that a 1 unit change in x
changes the expected value of y

135
00:09:15,930 --> 00:09:18,813
by beta 1 units.

136
00:09:19,720 --> 00:09:22,850
So what this means is, on average,

137
00:09:22,850 --> 00:09:26,950
for the average person or
for the expected value,

138
00:09:26,950 --> 00:09:31,210
that if you change x by 1 unit,

139
00:09:35,290 --> 00:09:38,250
it will change y by exactly beta 1,

140
00:09:38,250 --> 00:09:42,390
but for the expected
value, sort of, on average,

141
00:09:42,390 --> 00:09:45,542
if we were to forecast what the effect is,

142
00:09:45,542 --> 00:09:46,920
that's what it would be.

143
00:09:46,920 --> 00:09:50,720
And it doesn't matter if
our x is large or small

144
00:09:50,720 --> 00:09:53,753
because it is a linear relationship.

145
00:09:57,230 --> 00:10:02,230
So the lack of correlation between u and X

146
00:10:04,840 --> 00:10:07,830
is extremely important in econometrics.

147
00:10:07,830 --> 00:10:10,210
That might be the single
most important thing

148
00:10:13,740 --> 00:10:14,770
that we learn here.

149
00:10:14,770 --> 00:10:16,763
It's the fundamental assumption.

150
00:10:17,950 --> 00:10:18,820
And that is

151
00:10:22,290 --> 00:10:23,460
because,

152
00:10:23,460 --> 00:10:25,820
let's assume that that is not true.

153
00:10:25,820 --> 00:10:30,820
Let's assume that u changes as X changes

154
00:10:31,670 --> 00:10:36,670
and therefore du/dx, or delta
u/delta x does not equal zero.

155
00:10:38,430 --> 00:10:41,950
What we wanna do in econometrics is,

156
00:10:41,950 --> 00:10:46,480
what is the change in the
expected value, the y hat,

157
00:10:46,480 --> 00:10:48,210
as x changes?

158
00:10:48,210 --> 00:10:52,160
How does a change in the value of x change

159
00:10:52,160 --> 00:10:55,573
the predicted or the expected value of y?

160
00:10:57,970 --> 00:11:01,870
But we don't actually know
the expected value of y.

161
00:11:01,870 --> 00:11:04,110
All we know is y.

162
00:11:04,110 --> 00:11:09,110
And so if we change x and we change y,

163
00:11:09,320 --> 00:11:10,290
so when, you know,

164
00:11:10,290 --> 00:11:14,053
by looking at different data points,

165
00:11:15,160 --> 00:11:20,160
the change in y that we see
may be due to the change

166
00:11:20,940 --> 00:11:24,610
in the expected value,
but it must also be,

167
00:11:24,610 --> 00:11:29,563
or may also be, as a result in
the change in the error term.

168
00:11:30,690 --> 00:11:34,570
And we can't tell and we
can't pull those things apart.

169
00:11:34,570 --> 00:11:38,150
We can't see, oh, this part
is due to the expected value

170
00:11:38,150 --> 00:11:40,440
and this is due to the error term,

171
00:11:40,440 --> 00:11:44,840
because all that we observe is y.

172
00:11:44,840 --> 00:11:49,690
So we really wanna know
beta, which is d y-hat/dx.

173
00:11:49,690 --> 00:11:54,690
But if du/dx does not equal zero,

174
00:11:55,610 --> 00:11:58,780
we cannot make that forecast.

175
00:11:58,780 --> 00:12:00,000
And this is

176
00:12:02,600 --> 00:12:05,310
a concept that we're gonna revisit a lot,

177
00:12:05,310 --> 00:12:06,973
but it's a very important one.

178
00:12:11,070 --> 00:12:13,490
And so the interpretation then

179
00:12:13,490 --> 00:12:18,490
is that beta 1 is the change
in y for the average person.

180
00:12:18,940 --> 00:12:21,820
It's the forecasted, it's the expected,

181
00:12:21,820 --> 00:12:24,630
but it may not be true in every case.

182
00:12:24,630 --> 00:12:29,630
So if I, you know, if X is
income and Y is expenditure,

183
00:12:31,100 --> 00:12:34,623
if you put an extra dollar in my pocket,

184
00:12:35,840 --> 00:12:39,770
I will spend beta 1 of that on the good

185
00:12:39,770 --> 00:12:42,890
that we are wondering about.

186
00:12:42,890 --> 00:12:44,840
Expected, that's the expected value.

187
00:12:44,840 --> 00:12:47,873
I may or may not spend exactly that much,

188
00:12:48,960 --> 00:12:53,503
but that is how we interpret this.

189
00:12:58,780 --> 00:13:03,780
So now that we know some of
the uses of our estimates

190
00:13:05,380 --> 00:13:09,060
of beta, let's think
about how we derive it.

191
00:13:09,060 --> 00:13:11,363
So suppose that we draw,

192
00:13:13,180 --> 00:13:15,310
we have our two question survey,

193
00:13:15,310 --> 00:13:19,463
we ask X, we have Y, and
we have this linear model.

194
00:13:22,040 --> 00:13:27,040
So beta hat is the sum of,

195
00:13:27,130 --> 00:13:28,470
on the numerator,

196
00:13:28,470 --> 00:13:31,740
so look over here at
this sort of in the box,

197
00:13:31,740 --> 00:13:34,280
the second equation,

198
00:13:34,280 --> 00:13:35,630
the sum of

199
00:13:37,070 --> 00:13:40,000
each x minus the mean of x

200
00:13:40,000 --> 00:13:45,000
times each y times the
mean of y, all summed up.

201
00:13:45,360 --> 00:13:48,020
And in the denominator,

202
00:13:48,020 --> 00:13:52,983
each x minus the mean of x, squared.

203
00:13:54,510 --> 00:13:56,910
So another way of saying is

204
00:13:56,910 --> 00:14:01,810
it's the covariance of x and y

205
00:14:01,810 --> 00:14:04,333
divided by the variance of x.

206
00:14:05,960 --> 00:14:10,370
One of the ways that I
liked to think of it is

207
00:14:11,556 --> 00:14:14,530
the mean slope.

208
00:14:14,530 --> 00:14:17,180
And this is just an intuition.

209
00:14:17,180 --> 00:14:20,500
Just one way that it
helps me to think of it,

210
00:14:20,500 --> 00:14:22,150
that the slope of a line

211
00:14:22,150 --> 00:14:25,920
is the change in y

212
00:14:25,920 --> 00:14:27,580
over the change in x.

213
00:14:27,580 --> 00:14:31,310
So you can kind of in that
second equation there,

214
00:14:31,310 --> 00:14:36,310
sort of divide through
change of y over change in x.

215
00:14:37,690 --> 00:14:39,530
So it's sort of the average slope

216
00:14:39,530 --> 00:14:44,313
of how does x change as y changes.

217
00:14:45,540 --> 00:14:47,500
Another way of thinking about it,

218
00:14:47,500 --> 00:14:51,913
the covariance of x and y,

219
00:14:53,230 --> 00:14:54,763
the intuition is,

220
00:14:55,670 --> 00:14:59,810
to what extent do these two variables

221
00:14:59,810 --> 00:15:04,810
sort of march in lockstep or not?

222
00:15:04,870 --> 00:15:08,280
If one gets bigger, does one get smaller,

223
00:15:08,280 --> 00:15:09,500
or the other way around?

224
00:15:09,500 --> 00:15:12,140
Whether they both get
bigger or both gets smaller,

225
00:15:12,140 --> 00:15:14,520
or do they have no relationship because

226
00:15:14,520 --> 00:15:17,293
of the strong or weak, et cetera?

227
00:15:18,520 --> 00:15:21,180
And then sort of divided by,

228
00:15:21,180 --> 00:15:25,020
or normalized by the
overall variation in x.

229
00:15:27,490 --> 00:15:32,490
So how do these two
things change together,

230
00:15:32,790 --> 00:15:37,630
normalized by how does x itself change?

231
00:15:37,630 --> 00:15:41,120
So that's sort of a few
ways of thinking about

232
00:15:42,310 --> 00:15:47,310
both the intuition and the
math of the OLS estimator,

233
00:15:47,750 --> 00:15:48,853
this beta hat.

234
00:15:53,790 --> 00:15:57,250
So what the OLS estimator does is,

235
00:15:57,250 --> 00:16:01,720
it minimizes the sum of squared residuals.

236
00:16:01,720 --> 00:16:02,810
So it takes

237
00:16:06,490 --> 00:16:10,060
everybody's observation
and it chooses beta

238
00:16:10,060 --> 00:16:15,060
so that the distance
between their actual answer,

239
00:16:17,880 --> 00:16:20,010
what they actually said, y,

240
00:16:20,010 --> 00:16:22,923
and their predicted value squared,

241
00:16:23,800 --> 00:16:26,680
is as small as possible.

242
00:16:26,680 --> 00:16:30,610
That it minimizes the
square of the distance

243
00:16:30,610 --> 00:16:33,550
between y and y hat.

244
00:16:33,550 --> 00:16:35,080
Another way of thinking about it,

245
00:16:35,080 --> 00:16:40,080
or, specifically, it minimizes
the sum of the u i squares.

246
00:16:43,040 --> 00:16:46,583
you take everybody's u i, square it,

247
00:16:48,010 --> 00:16:51,310
add them up and

248
00:16:53,240 --> 00:16:58,170
beta OLS makes that
number as small as it can.

249
00:16:58,170 --> 00:17:01,630
So it's just like an optimization problem

250
00:17:01,630 --> 00:17:06,570
that you might have had in microeconomics.

251
00:17:06,570 --> 00:17:10,700
You're choosing the value of beta

252
00:17:10,700 --> 00:17:15,700
so that the value of u i
squared is as small as possible.

253
00:17:17,150 --> 00:17:18,713
So you're minimizing it.

254
00:17:20,910 --> 00:17:25,287
And again, where y hat is
the estimated value of y

255
00:17:26,510 --> 00:17:29,063
and y i is the actual value.

256
00:17:34,930 --> 00:17:38,853
So once you know beta,

257
00:17:40,800 --> 00:17:43,480
your beta naught and beta 1,

258
00:17:43,480 --> 00:17:46,830
you can draw a regression line.

259
00:17:46,830 --> 00:17:50,920
So you take the actual value of x

260
00:17:50,920 --> 00:17:54,350
and plug it into that
equation and get y hat.

261
00:17:54,350 --> 00:17:59,350
So you take every individual
on the survey, your answer,

262
00:17:59,550 --> 00:18:02,170
and this person and this
person and this person,

263
00:18:02,170 --> 00:18:07,170
put in their x, and then every
individual then has a y hat.

264
00:18:07,520 --> 00:18:10,290
And if you graph that and it makes a line,

265
00:18:10,290 --> 00:18:12,663
that is the regression line.

266
00:18:13,740 --> 00:18:16,720
And another way of thinking about it is,

267
00:18:16,720 --> 00:18:21,720
you sort of have this graph,

268
00:18:22,200 --> 00:18:24,160
you look at the value of x,

269
00:18:24,160 --> 00:18:26,350
you go up to the regression line,

270
00:18:26,350 --> 00:18:31,140
you go over and see what is
the y-value on that line,

271
00:18:31,140 --> 00:18:33,040
and that is y hat.

272
00:18:33,040 --> 00:18:37,300
And again, in OLS,

273
00:18:37,300 --> 00:18:40,570
the program chooses the beta hat,

274
00:18:40,570 --> 00:18:43,850
so beta 1 hat and beta naught hat,

275
00:18:43,850 --> 00:18:48,833
to minimize the sum of squared residuals.

276
00:18:54,450 --> 00:18:59,450
So if you're into matrix
algebra, if you're nerdy,

277
00:18:59,570 --> 00:19:03,490
I'm not quite this nerdy,

278
00:19:03,490 --> 00:19:06,803
although I guess I am a bit nerdy,

279
00:19:10,078 --> 00:19:13,210
here is the optimization problem.

280
00:19:13,210 --> 00:19:14,043
And you can see

281
00:19:14,043 --> 00:19:18,500
that beta equals x prime
x inverse x prime y.

282
00:19:18,500 --> 00:19:22,867
And do you see that it sort
of looks like that formula

283
00:19:24,560 --> 00:19:25,490
that I showed you,

284
00:19:25,490 --> 00:19:30,490
that mean slope or the
covariance of x and y divided by

285
00:19:30,930 --> 00:19:32,650
the variance of x?

286
00:19:32,650 --> 00:19:36,113
And if you wanna know
more, here's a proof.

287
00:19:37,690 --> 00:19:42,690
So basically you minimize
by choosing beta u prime u,

288
00:19:45,720 --> 00:19:48,963
or the sum of squared residuals.

289
00:19:54,000 --> 00:19:57,860
And again, this is the
sample regression line.

290
00:19:57,860 --> 00:20:02,050
So for every individual,

291
00:20:02,050 --> 00:20:07,050
you can input an x and
calculate what their y hat is,

292
00:20:07,330 --> 00:20:11,250
or even for any value of x,

293
00:20:11,250 --> 00:20:16,250
even if someone didn't actually
answer that on a survey,

294
00:20:16,350 --> 00:20:20,980
you can say, well, if someone
had given that value of x,

295
00:20:20,980 --> 00:20:24,197
what would we predict they
would have said for y?

296
00:20:24,197 --> 00:20:26,520
And that would be their y hat

297
00:20:26,520 --> 00:20:29,260
and you can use this formula.

298
00:20:37,940 --> 00:20:42,940
The reason we spend so much
time on OLS is to think back

299
00:20:43,180 --> 00:20:44,740
of the criteria

300
00:20:47,160 --> 00:20:48,620
for estimators,

301
00:20:48,620 --> 00:20:53,620
that when a number of
assumptions hold, OLS is blue,

302
00:20:53,980 --> 00:20:57,580
it's the best linear unbiased estimator.

303
00:20:57,580 --> 00:21:00,510
So "best" means the most efficient,

304
00:21:00,510 --> 00:21:03,210
the lowest variance.

305
00:21:03,210 --> 00:21:05,910
Unbiased, we learned about estimate,

306
00:21:05,910 --> 00:21:10,570
and that this is a linear function.

307
00:21:10,570 --> 00:21:15,450
So we're gonna see these five assumptions

308
00:21:15,450 --> 00:21:17,030
in the next few slides.

309
00:21:17,030 --> 00:21:18,640
And when these hold true,

310
00:21:18,640 --> 00:21:23,550
that the first four are needed

311
00:21:23,550 --> 00:21:26,810
for the beta to be unbiased,

312
00:21:26,810 --> 00:21:30,963
and if the fifth one holds,
it's also the most efficient.

313
00:21:35,760 --> 00:21:38,090
So here are the five assumptions,

314
00:21:38,090 --> 00:21:39,780
and I'm gonna go through each one

315
00:21:39,780 --> 00:21:42,610
in each of the next few slides.

316
00:21:42,610 --> 00:21:43,500
But here they are.

317
00:21:43,500 --> 00:21:48,370
First, it's that it is
linear in parameters.

318
00:21:48,370 --> 00:21:51,210
So this is a linear model,

319
00:21:51,210 --> 00:21:54,130
that you can model the phenomenon

320
00:21:54,130 --> 00:21:59,090
that you're interested
in by a linear equation;

321
00:21:59,090 --> 00:22:01,910
Two is it's random sampling.

322
00:22:01,910 --> 00:22:06,290
So the way that you draw
your sample is random.

323
00:22:06,290 --> 00:22:11,140
So you're not sort of privileging
one group over another,

324
00:22:11,140 --> 00:22:16,070
or just sort of taking whoever comes along

325
00:22:16,070 --> 00:22:21,070
or only taking the survey of older people,

326
00:22:21,300 --> 00:22:25,260
or lower income or only
men or anything like that.

327
00:22:25,260 --> 00:22:30,260
That it's a random sample
that resembles the population.

328
00:22:31,530 --> 00:22:36,530
Two is that X is a
non-stochastic constant.

329
00:22:37,399 --> 00:22:40,350
So your X's do not change over time,

330
00:22:40,350 --> 00:22:45,130
or at least during the
timeframe of the sampling.

331
00:22:45,130 --> 00:22:50,130
So if you go and take another sample,

332
00:22:50,840 --> 00:22:53,410
if you ask the same person again,

333
00:22:53,410 --> 00:22:55,653
that their X's will be the same.

334
00:22:56,680 --> 00:22:58,560
And another thing is,

335
00:22:58,560 --> 00:23:01,953
not everyone says the same thing for X,

336
00:23:01,953 --> 00:23:04,837
that X has to have some variation.

337
00:23:09,570 --> 00:23:13,350
Because if X is income
and Y is expenditure,

338
00:23:13,350 --> 00:23:17,440
if everybody has the exact same X,

339
00:23:17,440 --> 00:23:19,700
how do you know how Y changes?

340
00:23:19,700 --> 00:23:21,110
How would you think about,

341
00:23:21,110 --> 00:23:25,940
like graphing it, how would
you know the slope of a line

342
00:23:25,940 --> 00:23:28,450
if everybody says the exact same X,

343
00:23:28,450 --> 00:23:31,343
as you need to have something
with variation, X 2.

344
00:23:32,350 --> 00:23:36,230
The fourth one is this conditional mean,

345
00:23:36,230 --> 00:23:39,850
that the expected value
of everybody's error term

346
00:23:39,850 --> 00:23:43,360
given their X equals zero
for every individual.

347
00:23:43,360 --> 00:23:48,060
So X has no effect on the
expected value of the error term.

348
00:23:48,060 --> 00:23:53,050
And finally, the fifth one is needed

349
00:23:53,050 --> 00:23:56,535
for it to be the most efficient estimator.

350
00:23:56,535 --> 00:24:01,535
And that is the variant of
the error terms is a constant

351
00:24:01,890 --> 00:24:06,890
and the covariance of your
error term and my error term,

352
00:24:07,040 --> 00:24:10,380
the error term of two
individuals are zero.

353
00:24:10,380 --> 00:24:15,380
So your error term has nothing
to do with my error term.

354
00:24:16,780 --> 00:24:19,520
So thinking of our bingo ball example,

355
00:24:19,520 --> 00:24:22,540
the bingo ball that you
choose has no effect

356
00:24:22,540 --> 00:24:24,833
on what bingo ball that I choose.

357
00:24:26,060 --> 00:24:31,060
So we call this heteroskedasticity
and uncorrelated errors.

358
00:24:31,990 --> 00:24:35,040
And another way of thinking about this,

359
00:24:35,040 --> 00:24:38,490
and I'll do it more when
we cover this in depth,

360
00:24:38,490 --> 00:24:43,490
is that the variance-covariance
matrix is sigma square I,

361
00:24:45,010 --> 00:24:47,500
where I is the identity matrix.

362
00:24:47,500 --> 00:24:51,170
So thinking about an N by N matrix,

363
00:24:51,170 --> 00:24:56,170
the diagonal going from
top left to bottom right,

364
00:24:57,210 --> 00:25:02,210
everything on that diagonal is
this constant, sigma squared,

365
00:25:02,400 --> 00:25:05,973
and every other value is zero.

366
00:25:10,370 --> 00:25:12,480
Let's talk in depth a bit more.

367
00:25:12,480 --> 00:25:15,310
So our first assumption is
that the dependent variable

368
00:25:15,310 --> 00:25:19,520
can be calculated as a linear function

369
00:25:19,520 --> 00:25:22,323
of the independent variable
plus the disturbance,

370
00:25:26,170 --> 00:25:30,980
and the violations are thought
of as specification errors.

371
00:25:30,980 --> 00:25:34,143
First, if you choose the wrong regressors;

372
00:25:35,220 --> 00:25:38,960
Second, it's not actually
a linear function.

373
00:25:38,960 --> 00:25:43,410
It's not linear in the parameters,

374
00:25:43,410 --> 00:25:48,410
or if the parameters change over time.

375
00:25:49,400 --> 00:25:51,633
So the slope of the line,

376
00:25:54,025 --> 00:25:58,410
you know, gets greater
or lesser over time.

377
00:25:59,420 --> 00:26:04,250
So we assume that these
parameters are fixed

378
00:26:04,250 --> 00:26:09,250
in the population for any
given sampling period.

379
00:26:13,900 --> 00:26:17,870
Next is the assumption of random sampling.

380
00:26:17,870 --> 00:26:21,507
So one way of thinking about it is,

381
00:26:21,507 --> 00:26:25,010
and the most common way is,

382
00:26:25,010 --> 00:26:29,320
every individual in the
population has an equal chance

383
00:26:29,320 --> 00:26:33,140
of being selected into the sample.

384
00:26:33,140 --> 00:26:38,040
We're not oversampling any
group, again, by gender, by race,

385
00:26:38,040 --> 00:26:41,473
by income, by where they
live, or anything like that.

386
00:26:44,762 --> 00:26:48,730
Basically, that you end up with a sample

387
00:26:48,730 --> 00:26:53,730
that is representative of the population.

388
00:27:00,920 --> 00:27:02,210
The third one, again,

389
00:27:02,210 --> 00:27:05,670
is that the X is a
non-stochastic constant.

390
00:27:05,670 --> 00:27:10,510
So we have a random dataset
and observations of Y and X.

391
00:27:10,510 --> 00:27:14,313
The X's do not change
over the sampling period.

392
00:27:19,210 --> 00:27:24,210
Again, it's also important
that not every individual

393
00:27:24,470 --> 00:27:27,103
has the same value for X.

394
00:27:29,630 --> 00:27:34,580
And these observations are
fixed in repeated samples.

395
00:27:35,420 --> 00:27:38,830
So if you drew the same sample again,

396
00:27:38,830 --> 00:27:41,413
the X's would have the same value.

397
00:27:44,650 --> 00:27:48,530
If you were in the sample

398
00:27:48,530 --> 00:27:52,810
in one round and then we did
it again in another round,

399
00:27:52,810 --> 00:27:55,730
all of your X's would be the same.

400
00:27:55,730 --> 00:27:58,300
Your Y's would not be the same

401
00:27:58,300 --> 00:28:01,100
because there'll be the error term,

402
00:28:01,100 --> 00:28:06,100
which sort of makes them
go up and down again.

403
00:28:06,530 --> 00:28:09,180
Again, it depends on
sort of which bingo ball

404
00:28:09,180 --> 00:28:10,343
that you choose.

405
00:28:11,800 --> 00:28:15,940
The three main violations
of this are errors

406
00:28:15,940 --> 00:28:17,750
in the variables.

407
00:28:17,750 --> 00:28:21,330
So if you make a mistake
in how you measure X.

408
00:28:21,330 --> 00:28:23,360
And we're gonna talk near the end of class

409
00:28:23,360 --> 00:28:25,690
of what we can do about that.

410
00:28:25,690 --> 00:28:28,193
If we have what's called autoregression.

411
00:28:29,470 --> 00:28:34,470
So if last year Y is a
regressor in this year's Y,

412
00:28:34,680 --> 00:28:36,970
then that also falls apart.

413
00:28:36,970 --> 00:28:39,900
And simultaneous equation,

414
00:28:39,900 --> 00:28:43,700
where we actually have two dependents

415
00:28:43,700 --> 00:28:46,370
that sort of mutually effect each other.

416
00:28:46,370 --> 00:28:49,020
So in the model that we've been seeing,

417
00:28:49,020 --> 00:28:53,980
we can sort of think of
it as a one-way causality.

418
00:28:53,980 --> 00:28:57,940
That a change in X drives a change in Y.

419
00:28:57,940 --> 00:29:01,380
Whereas with simultaneous equations,

420
00:29:01,380 --> 00:29:04,900
that they're sort of
simultaneously driving each other.

421
00:29:04,900 --> 00:29:07,300
It's sort of an arrow with

422
00:29:10,267 --> 00:29:13,860
two arrow heads on each side.

423
00:29:13,860 --> 00:29:15,033
Whereas,

424
00:29:16,950 --> 00:29:20,270
in this model it's just X

425
00:29:20,270 --> 00:29:23,933
and an arrow pointing to
Y, if that makes sense.

426
00:29:31,530 --> 00:29:32,363
Finally,

427
00:29:33,630 --> 00:29:35,440
and again a really important assumption,

428
00:29:35,440 --> 00:29:39,190
is that the expected value of
the disturbance term is zero

429
00:29:39,190 --> 00:29:44,190
and that the value of X has no effect

430
00:29:44,600 --> 00:29:47,520
on the expected value of the error term.

431
00:29:47,520 --> 00:29:49,810
That the error term that you draw

432
00:29:49,810 --> 00:29:53,180
is sort of truly a random occurrence

433
00:29:53,180 --> 00:29:56,380
not dependent on what your X is.

434
00:29:59,110 --> 00:30:01,980
So high-income folks or low-income folks

435
00:30:01,980 --> 00:30:03,910
or medium-income folks,

436
00:30:03,910 --> 00:30:08,310
their income has no
effect on what error term

437
00:30:08,310 --> 00:30:09,310
that they would get.

438
00:30:13,540 --> 00:30:17,410
And the mean of it must be zero.

439
00:30:17,410 --> 00:30:21,043
And if that's not true, it
leads to biased intercept.

440
00:30:21,970 --> 00:30:26,970
So the big point here is
that for every individual i,

441
00:30:28,370 --> 00:30:32,013
the u i and the x i are uncorrelated.

442
00:30:37,620 --> 00:30:42,510
If these four things
hold, then the beta OLS,

443
00:30:42,510 --> 00:30:46,640
the OLS estimator of a beta, is unbiased,

444
00:30:46,640 --> 00:30:51,193
and that OLS will yield
an unbiased estimator.

445
00:30:54,900 --> 00:30:59,900
So those first four are needed
for beta OLS to be unbiased.

446
00:31:02,100 --> 00:31:04,560
And if this fifth assumption holds,

447
00:31:04,560 --> 00:31:09,220
then it's also the most
efficient estimator.

448
00:31:09,220 --> 00:31:12,063
So the one with the lowest variance,

449
00:31:12,063 --> 00:31:13,730
so that the one that,

450
00:31:13,730 --> 00:31:17,430
if you can imagine the bell curve

451
00:31:17,430 --> 00:31:20,620
where it's sort of the
skinniest bell curve,

452
00:31:20,620 --> 00:31:25,020
it's most sort of bunched
up around the mean.

453
00:31:25,020 --> 00:31:27,790
A tall, skinny bell curve would be

454
00:31:27,790 --> 00:31:31,270
a more efficient estimator.

455
00:31:31,270 --> 00:31:33,220
A short, fat bell curve

456
00:31:33,220 --> 00:31:36,683
would be the less efficient estimator.

457
00:31:38,280 --> 00:31:43,280
And we assume that there are
so-called spherical errors

458
00:31:43,480 --> 00:31:46,130
and that all the disturbances,

459
00:31:46,130 --> 00:31:48,070
they have the same variance

460
00:31:48,070 --> 00:31:51,610
and are not correlated with each other.

461
00:31:51,610 --> 00:31:52,443
So

462
00:31:57,300 --> 00:32:00,360
when they have equal variance,

463
00:32:00,360 --> 00:32:01,993
it's called homoskedasticity

464
00:32:01,993 --> 00:32:05,450
so that the value of x has no effect

465
00:32:05,450 --> 00:32:10,083
on what the variance of the error term is.

466
00:32:12,630 --> 00:32:14,850
When we have heteroscedasticity,

467
00:32:14,850 --> 00:32:19,250
it means that the variance
changes as X changes.

468
00:32:19,250 --> 00:32:20,083
And again,

469
00:32:20,083 --> 00:32:22,380
we're gonna spend a whole week on this in,

470
00:32:22,380 --> 00:32:23,803
like, four or five weeks.

471
00:32:24,960 --> 00:32:29,960
There's also autocorrelation.
This is also a violation.

472
00:32:32,130 --> 00:32:37,120
And that is where the
disturbances are correlated.

473
00:32:38,150 --> 00:32:43,150
So the bingo ball that
you draw will then affect

474
00:32:43,570 --> 00:32:45,850
which bingo ball that I draw.

475
00:32:45,850 --> 00:32:48,017
That is autocorrelation.

476
00:32:50,970 --> 00:32:54,110
And those are the violation, again,

477
00:32:54,110 --> 00:32:56,683
where we're gonna spend more time on this.

478
00:32:58,350 --> 00:33:01,680
But if we have these spherical errors,

479
00:33:01,680 --> 00:33:06,270
so if we have homoskedasticity
and uncorrelated errors,

480
00:33:08,310 --> 00:33:13,023
then this is the most
efficient estimator as well.

481
00:33:14,950 --> 00:33:16,380
The way we express it is,

482
00:33:16,380 --> 00:33:18,890
if 1 through 5 hold, OLS is blue,

483
00:33:18,890 --> 00:33:22,510
it's the best linear unbiased estimator.

484
00:33:22,510 --> 00:33:24,040
And again,

485
00:33:24,040 --> 00:33:27,280
if this spherical error assumption holds,

486
00:33:27,280 --> 00:33:30,163
it means it's the most
efficient estimator.

487
00:33:36,800 --> 00:33:40,523
There are a few other
conditions that have to hold.

488
00:33:41,960 --> 00:33:46,830
First, that the number of
observations, so your n,

489
00:33:46,830 --> 00:33:51,760
has to be greater than the
number of your regressors, k.

490
00:33:51,760 --> 00:33:54,440
So the degree of freedom

491
00:33:56,550 --> 00:33:58,650
is the number of observations

492
00:33:58,650 --> 00:34:00,563
minus the number of regressors.

493
00:34:03,410 --> 00:34:07,650
And that is measured by
the degrees of freedom.

494
00:34:07,650 --> 00:34:10,163
It must always be greater than 1.

495
00:34:15,013 --> 00:34:19,140
And more degrees of freedom is better,

496
00:34:21,220 --> 00:34:24,080
it gives a better model.

497
00:34:24,080 --> 00:34:28,220
Also, we're gonna learn
more about this too.

498
00:34:28,220 --> 00:34:33,220
There can't be an exact linear
relationship among the X's.

499
00:34:35,370 --> 00:34:38,847
So if your first X is income,

500
00:34:44,070 --> 00:34:47,240
and another X is

501
00:34:49,000 --> 00:34:52,050
5 times your income plus 1,

502
00:34:52,050 --> 00:34:55,060
they're a perfect linear
combination of each other.

503
00:34:55,060 --> 00:34:58,770
And if you know one, you
sort of know the other.

504
00:34:58,770 --> 00:35:02,790
So each regressor has to
add some new information

505
00:35:02,790 --> 00:35:06,120
and not just be a simple
function of each other.

506
00:35:06,120 --> 00:35:10,120
So we're gonna learn
a bit more about that.

507
00:35:10,120 --> 00:35:12,460
A good example for that

508
00:35:12,460 --> 00:35:16,270
is degrees Fahrenheit and degrees Celsius.

509
00:35:16,270 --> 00:35:20,130
So they are linear
combinations of each other

510
00:35:20,130 --> 00:35:23,530
and you can include
both of them in a model,

511
00:35:23,530 --> 00:35:26,010
say, of how fast a plant grows or,

512
00:35:26,010 --> 00:35:29,653
you know, how thickly the ice
forms or anything like that.

513
00:35:35,910 --> 00:35:40,910
So OLS kind of has a special
place in the econometrics.

514
00:35:43,540 --> 00:35:44,710
It's the

515
00:35:47,950 --> 00:35:50,300
simplest estimator in some ways,

516
00:35:50,300 --> 00:35:53,793
but it also scores high
in a lot of other ones.

517
00:35:54,720 --> 00:35:58,860
So think back, computational cost.

518
00:35:58,860 --> 00:36:03,330
I think any econometrics software
package is gonna have OLS.

519
00:36:03,330 --> 00:36:05,380
So certainly SPSS does and R

520
00:36:05,380 --> 00:36:08,120
and all the other well-known ones.

521
00:36:08,120 --> 00:36:12,380
It minimizes least squares,
basically by definition.

522
00:36:13,660 --> 00:36:17,110
It also maximizes R squared
for much the same reason,

523
00:36:17,110 --> 00:36:18,573
and it is unbiased.

524
00:36:22,410 --> 00:36:24,080
And if all of these assumptions hold,

525
00:36:24,080 --> 00:36:26,250
it's the best unbiased.

526
00:36:26,250 --> 00:36:31,250
That it will have the smallest
variance-covariance matrix.

527
00:36:31,470 --> 00:36:36,040
And if the errors are
normally distributed,

528
00:36:36,040 --> 00:36:41,010
it's not only BLUE, it's
BUE, it's the best unbiased.

529
00:36:41,010 --> 00:36:44,197
It may not minimize mean square error,

530
00:36:47,270 --> 00:36:50,210
that you could have a biased estimator

531
00:36:50,210 --> 00:36:52,450
with a really small variance,

532
00:36:52,450 --> 00:36:56,893
which then makes a
smaller mean square error.

533
00:36:58,580 --> 00:37:03,580
And last, if the disturbances
are normally distributed,

534
00:37:03,660 --> 00:37:05,410
and we're gonna talk a lot about that

535
00:37:05,410 --> 00:37:07,660
when we talk about hypothesis tests.

536
00:37:07,660 --> 00:37:09,270
But if that holds true,

537
00:37:09,270 --> 00:37:14,270
then OLS and maximum likelihood estimation

538
00:37:14,500 --> 00:37:15,943
are exactly the same.

539
00:37:19,230 --> 00:37:20,910
So this is what we did.

540
00:37:20,910 --> 00:37:24,850
We looked at the single regressor example.

541
00:37:24,850 --> 00:37:28,950
We talked about the
betas, the disturbances,

542
00:37:28,950 --> 00:37:33,110
how we get the OLS estimator
and the five assumptions.

543
00:37:33,110 --> 00:37:33,943
And again,

544
00:37:33,943 --> 00:37:37,053
if these five things
hold, then OLS is blue.

545
00:37:37,920 --> 00:37:42,920
So I hope that you found this
helpful as a review guide.

546
00:37:43,270 --> 00:37:45,023
And thanks.