1 00:00:01,125 --> 00:00:03,053 [Instructor] Now, I'm gonna show you how 2 00:00:03,053 --> 00:00:07,950 to estimate an equation, 3 00:00:07,950 --> 00:00:10,920 both for all the respondents 4 00:00:10,920 --> 00:00:13,789 and also how to split the sample 5 00:00:13,789 --> 00:00:17,409 into two groups so that we could do a Chow test. 6 00:00:17,409 --> 00:00:19,860 So the first thing that I'm gonna do is 7 00:00:19,860 --> 00:00:22,962 to run this model right here where, 8 00:00:22,962 --> 00:00:27,383 this model estimates how much sleep 9 00:00:27,383 --> 00:00:30,749 that an individual gets based on how much they work, 10 00:00:30,749 --> 00:00:32,730 their education, their age, 11 00:00:32,730 --> 00:00:36,690 the square of their age, and whether they have young kids. 12 00:00:36,690 --> 00:00:41,690 So we go to the dataset, and we do a linear regression, 13 00:00:44,651 --> 00:00:47,920 where sleep is the Dependent 14 00:00:48,840 --> 00:00:53,840 and then the Independents are the total work, educ, 15 00:01:04,108 --> 00:01:09,108 age, age squared, and the last one is young kids. 16 00:01:17,367 --> 00:01:22,367 So this looks at just these variables. 17 00:01:30,030 --> 00:01:34,350 And note that we're not controlling here for gender, 18 00:01:34,350 --> 00:01:37,015 that we do have a dummy variable, male, 19 00:01:37,015 --> 00:01:40,710 but we're not going to use that yet. 20 00:01:40,710 --> 00:01:44,264 So we're gonna say OK. 21 00:01:44,264 --> 00:01:47,849 We look at the results, and we see that, 22 00:01:47,849 --> 00:01:52,849 again not a huge R-squared, this is as before, 23 00:01:57,180 --> 00:02:02,180 that R-squared is the regression divided by the total. 24 00:02:04,200 --> 00:02:06,406 It looks like it's .115. 25 00:02:06,406 --> 00:02:11,406 And then, here, we see that the only significant value is, 26 00:02:13,650 --> 00:02:18,650 well, total work is and education is, it's significant, 27 00:02:23,280 --> 00:02:24,810 just barely. 28 00:02:24,810 --> 00:02:28,650 So the more that you work and the higher education 29 00:02:28,650 --> 00:02:30,967 that you have, the more that you sleep. 30 00:02:30,967 --> 00:02:33,992 So now, we're gonna say, "Does gender matter?" 31 00:02:33,992 --> 00:02:36,510 Well, one of the ways that we could do that is 32 00:02:36,510 --> 00:02:41,177 to just include the dummy variable male. 33 00:02:41,177 --> 00:02:42,419 So let's do that. 34 00:02:42,419 --> 00:02:47,419 Let's go back to our data and run the same regression, 35 00:02:49,587 --> 00:02:54,587 except now we're just gonna add male to it, and say OK. 36 00:02:58,980 --> 00:03:00,524 And when we scroll down, 37 00:03:00,524 --> 00:03:05,524 we see that male is pretty significant 38 00:03:06,016 --> 00:03:10,140 and with a strong positive beta. 39 00:03:10,140 --> 00:03:12,603 So being male means that you sleep more. 40 00:03:13,615 --> 00:03:16,397 So two ways that we could do this is to, 41 00:03:16,397 --> 00:03:21,397 if we think that the effect of male on work, and education, 42 00:03:22,243 --> 00:03:27,243 and age, and all of these other things is different 43 00:03:27,570 --> 00:03:32,570 for males and non-males, we could create interaction terms. 44 00:03:33,090 --> 00:03:35,125 And I actually did that. 45 00:03:35,125 --> 00:03:40,125 So I'll show you what that looks like. 46 00:03:42,663 --> 00:03:47,663 The way that you would do that is to transform and compute. 47 00:03:49,080 --> 00:03:53,670 I won't actually do it, but you could do, 48 00:03:53,670 --> 00:03:58,670 name it something, and then call it the equals male times, 49 00:04:01,484 --> 00:04:05,167 and then say age, and name it that. 50 00:04:08,700 --> 00:04:10,110 I already did that. 51 00:04:10,110 --> 00:04:13,500 So you'll see that in the dataset, 52 00:04:13,500 --> 00:04:16,290 but that's how you would do it or that's how I did it. 53 00:04:16,290 --> 00:04:18,003 But we're gonna get rid of this. 54 00:04:22,257 --> 00:04:27,257 So now, let's run it with all of those interactions. 55 00:04:30,873 --> 00:04:34,548 So again, we're just gonna run a linear regression. 56 00:04:34,548 --> 00:04:37,200 We're gonna leave all those things in. 57 00:04:37,200 --> 00:04:42,200 And now, we're also gonna add all these interaction terms. 58 00:04:43,650 --> 00:04:44,610 And note that, 59 00:04:44,610 --> 00:04:47,430 we're gonna lose a bunch of degrees of freedom 60 00:04:47,430 --> 00:04:48,764 by doing this. 61 00:04:48,764 --> 00:04:53,764 But let's see what we get and let's look at the results. 62 00:04:55,500 --> 00:04:58,774 So again, we added a regressor so R-squared goes up. 63 00:04:58,774 --> 00:05:02,278 And now, we see that, really, 64 00:05:02,278 --> 00:05:06,999 these are not very significant. 65 00:05:06,999 --> 00:05:11,013 We see male is no longer significant, 66 00:05:11,013 --> 00:05:14,700 but age and age squared are just barely. 67 00:05:14,700 --> 00:05:19,700 So work is, and male, young kid is barely. 68 00:05:22,875 --> 00:05:27,875 So the last thing I wanted to do is to run a Chow test 69 00:05:28,171 --> 00:05:33,171 and we're gonna get rid of all of these regressors 70 00:05:34,770 --> 00:05:39,770 that have male in them, but we're gonna split the sample. 71 00:05:40,080 --> 00:05:43,140 So the way that you do that is, 72 00:05:43,140 --> 00:05:48,140 you go to Data, and we split our file, 73 00:05:50,892 --> 00:05:55,770 and we are gonna Compare Groups by male. 74 00:05:56,640 --> 00:05:59,310 So we put male there. 75 00:05:59,310 --> 00:06:00,390 We say OK. 76 00:06:00,390 --> 00:06:01,435 Make sure that, 77 00:06:01,435 --> 00:06:05,220 so this is the kind of thing that you have to sort 78 00:06:05,220 --> 00:06:06,861 of turn on and off. 79 00:06:06,861 --> 00:06:08,249 So if you don't want, 80 00:06:08,249 --> 00:06:11,231 you need to go back there 81 00:06:11,231 --> 00:06:16,231 and just re-click Analyze All Cases, which is the default. 82 00:06:19,748 --> 00:06:23,168 So now, what we're saying, 83 00:06:23,168 --> 00:06:28,168 do males and non-males sort of experience sleep differently? 84 00:06:29,998 --> 00:06:33,631 Is our better model having a separate set of betas 85 00:06:33,631 --> 00:06:36,300 for males and non-males? 86 00:06:36,300 --> 00:06:41,012 So here, we're gonna do the same thing, a regression. 87 00:06:41,012 --> 00:06:46,012 Now, we need to get rid of all of these 88 00:06:46,232 --> 00:06:50,843 because these would be perfectly colinear 89 00:06:55,592 --> 00:07:00,058 because they would be all the males. 90 00:07:00,058 --> 00:07:02,646 It would just be a column of one. 91 00:07:02,646 --> 00:07:05,703 All the females for all of these, 92 00:07:05,703 --> 00:07:10,703 it would be a column of zeros. 93 00:07:12,202 --> 00:07:16,833 So we need to get rid of those or the model won't work. 94 00:07:17,826 --> 00:07:20,850 And let's make sure that we're back to where we wanted, 95 00:07:20,850 --> 00:07:25,470 total work, educ, age, age squared, young kid. 96 00:07:25,470 --> 00:07:28,020 And note that we split the sample, 97 00:07:28,020 --> 00:07:30,333 so we're gonna see it as two groups now. 98 00:07:32,033 --> 00:07:33,810 And then, we say OK. 99 00:07:33,810 --> 00:07:34,950 And it did that. 100 00:07:34,950 --> 00:07:37,590 So you see that we could get our own R-squared 101 00:07:37,590 --> 00:07:41,520 for both males, which are coded as 1, 102 00:07:41,520 --> 00:07:44,610 and non-males, which are coded as 0. 103 00:07:44,610 --> 00:07:47,482 Nice big F-stat. 104 00:07:47,482 --> 00:07:52,482 Each group gets its own R-squared. 105 00:07:52,740 --> 00:07:57,000 And then, also, each group gets its own coefficients. 106 00:07:57,000 --> 00:07:59,310 So this is the non-male group here. 107 00:07:59,310 --> 00:08:04,310 Work is significant, and that's it actually for both groups. 108 00:08:07,064 --> 00:08:11,139 Now, what we would do is to run a Chow test. 109 00:08:11,139 --> 00:08:15,483 And I'm gonna go to my Excel spreadsheet here. 110 00:08:17,790 --> 00:08:22,530 And I've gotta make sure that you can see this. 111 00:08:22,530 --> 00:08:25,483 So we ran it as a pool. 112 00:08:25,483 --> 00:08:30,483 So we're looking at the sum-of-squared residuals 113 00:08:30,540 --> 00:08:33,060 for all of these. 114 00:08:33,060 --> 00:08:37,320 And this number comes from, 115 00:08:37,320 --> 00:08:40,923 let's go up to the very first one that we ran here, 116 00:08:42,076 --> 00:08:44,782 it's this, the number right here. 117 00:08:44,782 --> 00:08:47,686 So before we split the sample, 118 00:08:47,686 --> 00:08:50,099 so you could copy and paste that. 119 00:08:50,099 --> 00:08:53,370 And then, the other two numbers come 120 00:08:53,370 --> 00:08:58,370 from our last regression, the residuals for each group. 121 00:09:02,802 --> 00:09:06,510 These are for non-males and this is for male. 122 00:09:06,510 --> 00:09:08,850 So we take this number, and this number, 123 00:09:08,850 --> 00:09:11,880 and put it into our spreadsheet. 124 00:09:11,880 --> 00:09:15,060 And I did the math here, 125 00:09:15,060 --> 00:09:19,500 and the numerator and the denominator here, 126 00:09:19,500 --> 00:09:23,189 the various degrees of freedom, 127 00:09:23,189 --> 00:09:28,189 and our F-stat is 2.12 and the critical value is 1.77. 128 00:09:32,668 --> 00:09:37,668 And I think I can show you that on the table, 129 00:09:37,736 --> 00:09:42,736 where it's six degrees of freedom 130 00:09:42,750 --> 00:09:47,750 and an infinite number at the 0.10 is 1.6677. 131 00:09:53,340 --> 00:09:58,340 At 0.05, it's 2.21, it looks like. 132 00:09:58,837 --> 00:10:01,893 So it's very close. 133 00:10:04,895 --> 00:10:07,590 So since it's bigger, then, 134 00:10:07,590 --> 00:10:12,480 we would reject our null that males 135 00:10:12,480 --> 00:10:15,368 and non-males have the same coefficients. 136 00:10:15,368 --> 00:10:18,750 And we would say that running the model 137 00:10:18,750 --> 00:10:22,742 as two different groups is the better model 138 00:10:22,742 --> 00:10:25,923 as a result of our Chow test.