1
00:00:01,125 --> 00:00:03,053
[Instructor] Now, I'm gonna show you how

2
00:00:03,053 --> 00:00:07,950
to estimate an equation,

3
00:00:07,950 --> 00:00:10,920
both for all the respondents

4
00:00:10,920 --> 00:00:13,789
and also how to split the sample

5
00:00:13,789 --> 00:00:17,409
into two groups so that
we could do a Chow test.

6
00:00:17,409 --> 00:00:19,860
So the first thing that I'm gonna do is

7
00:00:19,860 --> 00:00:22,962
to run this model right here where,

8
00:00:22,962 --> 00:00:27,383
this model estimates how much sleep

9
00:00:27,383 --> 00:00:30,749
that an individual gets
based on how much they work,

10
00:00:30,749 --> 00:00:32,730
their education, their age,

11
00:00:32,730 --> 00:00:36,690
the square of their age, and
whether they have young kids.

12
00:00:36,690 --> 00:00:41,690
So we go to the dataset, and
we do a linear regression,

13
00:00:44,651 --> 00:00:47,920
where sleep is the Dependent

14
00:00:48,840 --> 00:00:53,840
and then the Independents
are the total work, educ,

15
00:01:04,108 --> 00:01:09,108
age, age squared, and the
last one is young kids.

16
00:01:17,367 --> 00:01:22,367
So this looks at just these variables.

17
00:01:30,030 --> 00:01:34,350
And note that we're not
controlling here for gender,

18
00:01:34,350 --> 00:01:37,015
that we do have a dummy variable, male,

19
00:01:37,015 --> 00:01:40,710
but we're not going to use that yet.

20
00:01:40,710 --> 00:01:44,264
So we're gonna say OK.

21
00:01:44,264 --> 00:01:47,849
We look at the results, and we see that,

22
00:01:47,849 --> 00:01:52,849
again not a huge R-squared,
this is as before,

23
00:01:57,180 --> 00:02:02,180
that R-squared is the
regression divided by the total.

24
00:02:04,200 --> 00:02:06,406
It looks like it's .115.

25
00:02:06,406 --> 00:02:11,406
And then, here, we see that
the only significant value is,

26
00:02:13,650 --> 00:02:18,650
well, total work is and
education is, it's significant,

27
00:02:23,280 --> 00:02:24,810
just barely.

28
00:02:24,810 --> 00:02:28,650
So the more that you work
and the higher education

29
00:02:28,650 --> 00:02:30,967
that you have, the more that you sleep.

30
00:02:30,967 --> 00:02:33,992
So now, we're gonna say,
"Does gender matter?"

31
00:02:33,992 --> 00:02:36,510
Well, one of the ways
that we could do that is

32
00:02:36,510 --> 00:02:41,177
to just include the dummy variable male.

33
00:02:41,177 --> 00:02:42,419
So let's do that.

34
00:02:42,419 --> 00:02:47,419
Let's go back to our data
and run the same regression,

35
00:02:49,587 --> 00:02:54,587
except now we're just gonna
add male to it, and say OK.

36
00:02:58,980 --> 00:03:00,524
And when we scroll down,

37
00:03:00,524 --> 00:03:05,524
we see that male is pretty significant

38
00:03:06,016 --> 00:03:10,140
and with a strong positive beta.

39
00:03:10,140 --> 00:03:12,603
So being male means that you sleep more.

40
00:03:13,615 --> 00:03:16,397
So two ways that we could do this is to,

41
00:03:16,397 --> 00:03:21,397
if we think that the effect of
male on work, and education,

42
00:03:22,243 --> 00:03:27,243
and age, and all of these
other things is different

43
00:03:27,570 --> 00:03:32,570
for males and non-males, we
could create interaction terms.

44
00:03:33,090 --> 00:03:35,125
And I actually did that.

45
00:03:35,125 --> 00:03:40,125
So I'll show you what that looks like.

46
00:03:42,663 --> 00:03:47,663
The way that you would do that
is to transform and compute.

47
00:03:49,080 --> 00:03:53,670
I won't actually do it, but you could do,

48
00:03:53,670 --> 00:03:58,670
name it something, and then
call it the equals male times,

49
00:04:01,484 --> 00:04:05,167
and then say age, and name it that.

50
00:04:08,700 --> 00:04:10,110
I already did that.

51
00:04:10,110 --> 00:04:13,500
So you'll see that in the dataset,

52
00:04:13,500 --> 00:04:16,290
but that's how you would do
it or that's how I did it.

53
00:04:16,290 --> 00:04:18,003
But we're gonna get rid of this.

54
00:04:22,257 --> 00:04:27,257
So now, let's run it with
all of those interactions.

55
00:04:30,873 --> 00:04:34,548
So again, we're just gonna
run a linear regression.

56
00:04:34,548 --> 00:04:37,200
We're gonna leave all those things in.

57
00:04:37,200 --> 00:04:42,200
And now, we're also gonna add
all these interaction terms.

58
00:04:43,650 --> 00:04:44,610
And note that,

59
00:04:44,610 --> 00:04:47,430
we're gonna lose a bunch
of degrees of freedom

60
00:04:47,430 --> 00:04:48,764
by doing this.

61
00:04:48,764 --> 00:04:53,764
But let's see what we get and
let's look at the results.

62
00:04:55,500 --> 00:04:58,774
So again, we added a regressor
so R-squared goes up.

63
00:04:58,774 --> 00:05:02,278
And now, we see that, really,

64
00:05:02,278 --> 00:05:06,999
these are not very significant.

65
00:05:06,999 --> 00:05:11,013
We see male is no longer significant,

66
00:05:11,013 --> 00:05:14,700
but age and age squared are just barely.

67
00:05:14,700 --> 00:05:19,700
So work is, and male, young kid is barely.

68
00:05:22,875 --> 00:05:27,875
So the last thing I wanted
to do is to run a Chow test

69
00:05:28,171 --> 00:05:33,171
and we're gonna get rid
of all of these regressors

70
00:05:34,770 --> 00:05:39,770
that have male in them, but
we're gonna split the sample.

71
00:05:40,080 --> 00:05:43,140
So the way that you do that is,

72
00:05:43,140 --> 00:05:48,140
you go to Data, and we split our file,

73
00:05:50,892 --> 00:05:55,770
and we are gonna Compare Groups by male.

74
00:05:56,640 --> 00:05:59,310
So we put male there.

75
00:05:59,310 --> 00:06:00,390
We say OK.

76
00:06:00,390 --> 00:06:01,435
Make sure that,

77
00:06:01,435 --> 00:06:05,220
so this is the kind of
thing that you have to sort

78
00:06:05,220 --> 00:06:06,861
of turn on and off.

79
00:06:06,861 --> 00:06:08,249
So if you don't want,

80
00:06:08,249 --> 00:06:11,231
you need to go back there

81
00:06:11,231 --> 00:06:16,231
and just re-click Analyze All
Cases, which is the default.

82
00:06:19,748 --> 00:06:23,168
So now, what we're saying,

83
00:06:23,168 --> 00:06:28,168
do males and non-males sort of
experience sleep differently?

84
00:06:29,998 --> 00:06:33,631
Is our better model having
a separate set of betas

85
00:06:33,631 --> 00:06:36,300
for males and non-males?

86
00:06:36,300 --> 00:06:41,012
So here, we're gonna do the
same thing, a regression.

87
00:06:41,012 --> 00:06:46,012
Now, we need to get rid of all of these

88
00:06:46,232 --> 00:06:50,843
because these would be perfectly colinear

89
00:06:55,592 --> 00:07:00,058
because they would be all the males.

90
00:07:00,058 --> 00:07:02,646
It would just be a column of one.

91
00:07:02,646 --> 00:07:05,703
All the females for all of these,

92
00:07:05,703 --> 00:07:10,703
it would be a column of zeros.

93
00:07:12,202 --> 00:07:16,833
So we need to get rid of
those or the model won't work.

94
00:07:17,826 --> 00:07:20,850
And let's make sure that
we're back to where we wanted,

95
00:07:20,850 --> 00:07:25,470
total work, educ, age,
age squared, young kid.

96
00:07:25,470 --> 00:07:28,020
And note that we split the sample,

97
00:07:28,020 --> 00:07:30,333
so we're gonna see it as two groups now.

98
00:07:32,033 --> 00:07:33,810
And then, we say OK.

99
00:07:33,810 --> 00:07:34,950
And it did that.

100
00:07:34,950 --> 00:07:37,590
So you see that we could
get our own R-squared

101
00:07:37,590 --> 00:07:41,520
for both males, which are coded as 1,

102
00:07:41,520 --> 00:07:44,610
and non-males, which are coded as 0.

103
00:07:44,610 --> 00:07:47,482
Nice big F-stat.

104
00:07:47,482 --> 00:07:52,482
Each group gets its own R-squared.

105
00:07:52,740 --> 00:07:57,000
And then, also, each group
gets its own coefficients.

106
00:07:57,000 --> 00:07:59,310
So this is the non-male group here.

107
00:07:59,310 --> 00:08:04,310
Work is significant, and that's
it actually for both groups.

108
00:08:07,064 --> 00:08:11,139
Now, what we would do
is to run a Chow test.

109
00:08:11,139 --> 00:08:15,483
And I'm gonna go to my
Excel spreadsheet here.

110
00:08:17,790 --> 00:08:22,530
And I've gotta make sure
that you can see this.

111
00:08:22,530 --> 00:08:25,483
So we ran it as a pool.

112
00:08:25,483 --> 00:08:30,483
So we're looking at the
sum-of-squared residuals

113
00:08:30,540 --> 00:08:33,060
for all of these.

114
00:08:33,060 --> 00:08:37,320
And this number comes from,

115
00:08:37,320 --> 00:08:40,923
let's go up to the very
first one that we ran here,

116
00:08:42,076 --> 00:08:44,782
it's this, the number right here.

117
00:08:44,782 --> 00:08:47,686
So before we split the sample,

118
00:08:47,686 --> 00:08:50,099
so you could copy and paste that.

119
00:08:50,099 --> 00:08:53,370
And then, the other two numbers come

120
00:08:53,370 --> 00:08:58,370
from our last regression,
the residuals for each group.

121
00:09:02,802 --> 00:09:06,510
These are for non-males
and this is for male.

122
00:09:06,510 --> 00:09:08,850
So we take this number, and this number,

123
00:09:08,850 --> 00:09:11,880
and put it into our spreadsheet.

124
00:09:11,880 --> 00:09:15,060
And I did the math here,

125
00:09:15,060 --> 00:09:19,500
and the numerator and
the denominator here,

126
00:09:19,500 --> 00:09:23,189
the various degrees of freedom,

127
00:09:23,189 --> 00:09:28,189
and our F-stat is 2.12 and
the critical value is 1.77.

128
00:09:32,668 --> 00:09:37,668
And I think I can show
you that on the table,

129
00:09:37,736 --> 00:09:42,736
where it's six degrees of freedom

130
00:09:42,750 --> 00:09:47,750
and an infinite number
at the 0.10 is 1.6677.

131
00:09:53,340 --> 00:09:58,340
At 0.05, it's 2.21, it looks like.

132
00:09:58,837 --> 00:10:01,893
So it's very close.

133
00:10:04,895 --> 00:10:07,590
So since it's bigger, then,

134
00:10:07,590 --> 00:10:12,480
we would reject our null that males

135
00:10:12,480 --> 00:10:15,368
and non-males have the same coefficients.

136
00:10:15,368 --> 00:10:18,750
And we would say that running the model

137
00:10:18,750 --> 00:10:22,742
as two different groups
is the better model

138
00:10:22,742 --> 00:10:25,923
as a result of our Chow test.