1 00:00:02,910 --> 00:00:05,320 - [Instructor] Good morning, everyone. 2 00:00:05,320 --> 00:00:08,410 It's nice to be back at this, 3 00:00:08,410 --> 00:00:09,930 I know that I wish 4 00:00:09,930 --> 00:00:12,940 things were under slightly better circumstances, 5 00:00:12,940 --> 00:00:15,580 but nonetheless, we're gonna get through this 6 00:00:15,580 --> 00:00:18,900 and I'm excited for the rest of the semester. 7 00:00:18,900 --> 00:00:23,060 So, this time, we're gonna start on panel data, 8 00:00:23,060 --> 00:00:27,120 which is collecting responses 9 00:00:27,120 --> 00:00:30,320 from multiple respondents over time. 10 00:00:30,320 --> 00:00:32,840 You may remember that so far, 11 00:00:32,840 --> 00:00:34,970 everything we've done is cross section 12 00:00:34,970 --> 00:00:39,970 where it's one time period, T equals one, N equals many. 13 00:00:41,640 --> 00:00:44,820 Later, we're going to be doing time series, 14 00:00:44,820 --> 00:00:48,768 which is N equals one, T equals many, 15 00:00:48,768 --> 00:00:51,940 but these next two sections are about 16 00:00:53,650 --> 00:00:57,363 T equals many and N equals many. 17 00:01:00,710 --> 00:01:05,120 So, here is a little bit of humor 18 00:01:05,120 --> 00:01:07,403 that I hope that you appreciate. 19 00:01:11,490 --> 00:01:15,950 So, this is the problem set based on panel data, 20 00:01:15,950 --> 00:01:18,680 this reflects the next two weeks of work, 21 00:01:18,680 --> 00:01:21,580 so, it's not due until April 3rd. 22 00:01:21,580 --> 00:01:26,580 First, I want you to show the intuition and the math 23 00:01:26,830 --> 00:01:30,340 of this difference of difference models, 24 00:01:30,340 --> 00:01:32,710 show that this Delta one hat 25 00:01:34,050 --> 00:01:35,400 reflects 26 00:01:36,480 --> 00:01:41,280 the effect of the treatment and time too 27 00:01:41,280 --> 00:01:43,980 and you'll see what this means when we go forward. 28 00:01:43,980 --> 00:01:48,910 I want you to give an example of where we would use this, 29 00:01:48,910 --> 00:01:53,670 two locations and two time periods or treatment and control 30 00:01:53,670 --> 00:01:58,000 two time periods and explain what that is. 31 00:01:58,000 --> 00:02:00,960 Next I want you to talk about the difference between fixed 32 00:02:00,960 --> 00:02:04,990 and random effects, and which one is better and why. 33 00:02:04,990 --> 00:02:07,900 And finally, the Hausman test, 34 00:02:07,900 --> 00:02:11,530 which is tied to number three there, 35 00:02:11,530 --> 00:02:16,240 where we talk about which model to use. 36 00:02:16,240 --> 00:02:20,500 And I want you to describe this test much as you did 37 00:02:20,500 --> 00:02:24,923 on the midterm exam for other tests. 38 00:02:26,670 --> 00:02:31,410 So, again, so far we've been doing cross section, 39 00:02:31,410 --> 00:02:35,470 which is T equals one, N equals many. 40 00:02:35,470 --> 00:02:40,073 Now we're going to do both T and N, greater than one. 41 00:02:41,230 --> 00:02:46,160 So, again at N and T, N being the number of respondents 42 00:02:46,160 --> 00:02:49,830 and the T being the number of time periods, 43 00:02:49,830 --> 00:02:52,180 both being greater than one. 44 00:02:52,180 --> 00:02:55,450 And we're gonna talk about two main types, 45 00:02:55,450 --> 00:03:00,130 independently pooled cross section and panel data. 46 00:03:00,130 --> 00:03:02,940 So, in one case, it sort of the first one, 47 00:03:02,940 --> 00:03:07,070 independently pooled is you're asking the same questions 48 00:03:07,070 --> 00:03:09,910 to different people at different times, 49 00:03:09,910 --> 00:03:12,460 and in sort of true panel data, 50 00:03:12,460 --> 00:03:15,960 you're following people over time 51 00:03:15,960 --> 00:03:19,860 and asking the same questions to the same people 52 00:03:19,860 --> 00:03:21,970 and tracking responses. 53 00:03:21,970 --> 00:03:23,940 So, if I were a panel member, 54 00:03:23,940 --> 00:03:28,000 they would know what I said in the first time period 55 00:03:28,000 --> 00:03:30,350 and the second time period, and the third, 56 00:03:30,350 --> 00:03:31,953 and however many there are, 57 00:03:34,840 --> 00:03:39,743 and knowing what I said on the survey each time. 58 00:03:43,090 --> 00:03:45,440 So, again, the first part 59 00:03:45,440 --> 00:03:48,010 is independently pooled cross section. 60 00:03:48,010 --> 00:03:51,550 So, again, you pool a random sample each time. 61 00:03:51,550 --> 00:03:54,900 So, you have a common set of questions that you ask 62 00:03:54,900 --> 00:03:56,240 over multiple times, 63 00:03:56,240 --> 00:03:59,370 but each time you're pooling a new sample, 64 00:03:59,370 --> 00:04:02,300 a random sample, hopefully. 65 00:04:02,300 --> 00:04:07,300 And because each observation each time 66 00:04:08,240 --> 00:04:09,910 is sampled independently, 67 00:04:09,910 --> 00:04:12,150 you're drawing a new sample each time, 68 00:04:12,150 --> 00:04:14,090 this gets rid of a lot of problems, 69 00:04:14,090 --> 00:04:17,130 especially autocorrelated errors. 70 00:04:17,130 --> 00:04:21,090 And a good example of this might be that, 71 00:04:21,090 --> 00:04:24,950 a common set of questions on the Vermonter poll. 72 00:04:24,950 --> 00:04:27,070 So, we ask them in 2019, 73 00:04:27,070 --> 00:04:31,320 we asked them in 2020, totally new sample each time, 74 00:04:31,320 --> 00:04:33,820 but we record the answers 75 00:04:33,820 --> 00:04:38,023 and see how the answers change over time. 76 00:04:39,020 --> 00:04:42,050 Versus, again, panel data, 77 00:04:42,050 --> 00:04:46,460 which is where we're following the same people, 78 00:04:46,460 --> 00:04:49,380 usually people, but it may be households, 79 00:04:49,380 --> 00:04:53,980 or firms or cities, or many other units of analysis. 80 00:04:53,980 --> 00:04:56,620 Again, it's the same subject over time. 81 00:04:56,620 --> 00:05:00,400 And we follow an individual respondent 82 00:05:00,400 --> 00:05:02,570 and how they answer over time. 83 00:05:02,570 --> 00:05:04,560 We pay attention to that, 84 00:05:04,560 --> 00:05:09,560 of how each individual responses change over time. 85 00:05:13,810 --> 00:05:18,810 So, now we get back to independently pooled cross section. 86 00:05:20,210 --> 00:05:24,610 So, one of the things that we need to do 87 00:05:24,610 --> 00:05:27,820 is account for how to do the responses 88 00:05:27,820 --> 00:05:29,880 just naturally change over time, 89 00:05:29,880 --> 00:05:33,710 that as time advances things change, 90 00:05:33,710 --> 00:05:38,450 the responses to questions change over time. 91 00:05:38,450 --> 00:05:42,570 So, the most obvious way to do this would be 92 00:05:43,800 --> 00:05:47,850 to have a dummy variable for each year. 93 00:05:47,850 --> 00:05:48,780 So, say again, 94 00:05:48,780 --> 00:05:53,060 we're doing the Vermonter poll 2019, '21, 95 00:05:53,060 --> 00:05:57,120 we simply have a dummy variable for one of those years, 96 00:05:57,120 --> 00:06:01,870 probably 2020, that controls for everything 97 00:06:01,870 --> 00:06:04,670 that changes over time. 98 00:06:04,670 --> 00:06:06,470 So, holding all as equal, 99 00:06:06,470 --> 00:06:09,920 what's simply the effect of time 100 00:06:09,920 --> 00:06:13,260 of a year in advance on how people 101 00:06:14,950 --> 00:06:17,000 answer the question? 102 00:06:17,000 --> 00:06:20,690 You can also, as we learned in dummy variables, 103 00:06:20,690 --> 00:06:25,690 interact this time dummy variable with our other regressors. 104 00:06:26,390 --> 00:06:30,610 So, as income changes and as education changes 105 00:06:30,610 --> 00:06:33,343 and as age changes, et cetera. 106 00:06:36,210 --> 00:06:38,800 How does that change the slope 107 00:06:38,800 --> 00:06:43,800 of those variables like income and age, et cetera, 108 00:06:46,870 --> 00:06:47,703 over time? 109 00:06:47,703 --> 00:06:50,930 What's the effect of a change in time on those? 110 00:06:50,930 --> 00:06:53,290 And we remember that we did that, 111 00:06:53,290 --> 00:06:56,380 that different slopes for different folks name 112 00:06:56,380 --> 00:07:01,070 as we dealt with in dummy variables. 113 00:07:01,070 --> 00:07:03,663 And you could also in theory, 114 00:07:04,810 --> 00:07:09,240 have a different dummy variable for each person, 115 00:07:09,240 --> 00:07:11,763 but think about what would be the problem there. 116 00:07:12,660 --> 00:07:17,660 I will give you a second to think cue the Jeopardy Music. 117 00:07:20,630 --> 00:07:23,890 So, the problem there is if there's a different dummy 118 00:07:23,890 --> 00:07:25,690 for each individual, 119 00:07:25,690 --> 00:07:29,680 you're very quickly going to run out of degrees of freedom 120 00:07:29,680 --> 00:07:33,320 and because that's a whole lot of extra variables. 121 00:07:33,320 --> 00:07:36,300 But most of the problem with this is dealt with 122 00:07:36,300 --> 00:07:40,070 because we are drawing a new random sample each time. 123 00:07:40,070 --> 00:07:43,463 In panel data it's a little different and we'll see. 124 00:07:45,290 --> 00:07:48,740 So, again, thinking back to the different slopes 125 00:07:48,740 --> 00:07:50,310 for different folks, 126 00:07:50,310 --> 00:07:53,000 questions or this time it's different slopes 127 00:07:53,000 --> 00:07:55,360 for different times question, 128 00:07:55,360 --> 00:07:58,870 you can interact all the time dummies 129 00:07:58,870 --> 00:08:01,080 with all your other regressors 130 00:08:01,080 --> 00:08:04,280 like, we learned about in the dummy variable section, 131 00:08:04,280 --> 00:08:07,660 or also as we learned about in that section. 132 00:08:07,660 --> 00:08:12,610 Do a child test and split up the sample 133 00:08:12,610 --> 00:08:14,210 into time period. 134 00:08:14,210 --> 00:08:16,230 So, again, with the Vermonter poll, 135 00:08:16,230 --> 00:08:21,230 split it up into the 2019 and 2020 responses 136 00:08:21,270 --> 00:08:24,060 and run a child test, run each 137 00:08:27,340 --> 00:08:29,460 year's data independently 138 00:08:29,460 --> 00:08:32,490 and then pooled and use a child test 139 00:08:32,490 --> 00:08:36,270 that F test to see if it works. 140 00:08:36,270 --> 00:08:40,520 You could also do this in multiple time period. 141 00:08:40,520 --> 00:08:43,700 So, if you had multi-year data 142 00:08:43,700 --> 00:08:46,630 2018, 2019, 2020, 143 00:08:46,630 --> 00:08:49,410 you could test the null hypothesis 144 00:08:49,410 --> 00:08:54,070 that the slopes of all of our regressors are the same 145 00:08:54,070 --> 00:08:55,340 over each time. 146 00:08:55,340 --> 00:08:59,070 That time didn't change the fundamental way 147 00:08:59,070 --> 00:09:00,913 over which people answer. 148 00:09:05,150 --> 00:09:09,780 So, now we're going to use a very well-known 149 00:09:09,780 --> 00:09:14,210 and it seems like kind of hot and sexy method 150 00:09:14,210 --> 00:09:18,870 to look at certain applications of policy analysis. 151 00:09:18,870 --> 00:09:23,023 The so-called difference in differences model. 152 00:09:23,930 --> 00:09:28,930 And this is good for a sort of natural or quasi experiment. 153 00:09:29,090 --> 00:09:32,810 So, you can think about something happening 154 00:09:34,083 --> 00:09:38,720 to two different areas over two different times, 155 00:09:38,720 --> 00:09:43,260 some sort of treatment where this thing happens, 156 00:09:43,260 --> 00:09:46,250 some kind of control where it doesn't happen, 157 00:09:46,250 --> 00:09:50,750 and then what is the effect of the change in time? 158 00:09:50,750 --> 00:09:54,910 So, this is very often used and I'm gonna use an example 159 00:09:54,910 --> 00:09:57,020 looking at the very bottom bullet 160 00:09:57,020 --> 00:10:01,840 that sort of classic application is some sort of NIMBY, 161 00:10:01,840 --> 00:10:03,730 some sort of Not In My Backyard, 162 00:10:03,730 --> 00:10:08,560 like, some nasty trash incinerator or something else 163 00:10:08,560 --> 00:10:10,530 that people probably wouldn't want 164 00:10:10,530 --> 00:10:13,110 moves into a neighborhood. 165 00:10:13,110 --> 00:10:17,420 And we want to measure the effects of property values. 166 00:10:17,420 --> 00:10:22,420 So, imagine that our Y is the property value of homes 167 00:10:23,840 --> 00:10:25,660 in each place. 168 00:10:25,660 --> 00:10:30,660 So, if it happens in the neighborhood 169 00:10:31,580 --> 00:10:35,450 where this trash incinerator is put in, 170 00:10:35,450 --> 00:10:40,070 then they're coded as one as the treatment group 171 00:10:40,070 --> 00:10:44,290 and the comparison neighborhood where this doesn't get 172 00:10:45,560 --> 00:10:48,230 the trash incinerator is zero. 173 00:10:48,230 --> 00:10:51,860 And then we have a time dummy variable 174 00:10:51,860 --> 00:10:55,680 that after it's put in and we remeasure, 175 00:10:55,680 --> 00:10:57,740 that's coded as one. 176 00:10:57,740 --> 00:11:00,510 And before it's put in, 177 00:11:00,510 --> 00:11:05,310 which is sort of a baseline measure is coded as zero. 178 00:11:05,310 --> 00:11:07,530 So, the reason that we do that, 179 00:11:07,530 --> 00:11:12,530 that we control for location is that it might be, 180 00:11:12,930 --> 00:11:16,490 as we know, there's lots of sort of evidence out there 181 00:11:16,490 --> 00:11:20,040 in the world of things like environmental racism, 182 00:11:20,040 --> 00:11:25,040 where they tend to put the NIMBYs, 183 00:11:25,590 --> 00:11:30,140 these nasty, polluting, dangerous things in neighborhoods 184 00:11:30,140 --> 00:11:31,770 where poor folks live, 185 00:11:31,770 --> 00:11:36,000 or maybe people of color or other marginalized groups, 186 00:11:36,000 --> 00:11:40,770 that you're not gonna see a trash incinerator 187 00:11:40,770 --> 00:11:44,130 put into a very wealthy neighborhood most of the time, 188 00:11:44,130 --> 00:11:48,300 because those folks tend to have more political agency 189 00:11:48,300 --> 00:11:50,193 and will raise a big fuss. 190 00:11:51,318 --> 00:11:54,070 And so, that's kind of a non-starter, 191 00:11:54,070 --> 00:11:56,710 but sadly, we do tend to put things 192 00:11:56,710 --> 00:11:58,910 in marginalized neighborhoods. 193 00:11:58,910 --> 00:12:00,810 So, we want to control for that. 194 00:12:00,810 --> 00:12:05,810 We wanna look at choose a marginalized neighborhood 195 00:12:07,040 --> 00:12:11,760 that doesn't get this trash incinerator and one that does. 196 00:12:11,760 --> 00:12:14,180 So, we're controlling both for 197 00:12:15,530 --> 00:12:20,530 what is the effect of property values 198 00:12:21,490 --> 00:12:23,200 in those neighborhoods. 199 00:12:23,200 --> 00:12:25,520 So, we take baseline measures. 200 00:12:25,520 --> 00:12:28,760 So, we measure property values in a random sample 201 00:12:28,760 --> 00:12:32,480 of control and treatment. 202 00:12:32,480 --> 00:12:35,950 Then we look at, they put in this NIMBY, 203 00:12:35,950 --> 00:12:37,960 this trash incinerator. 204 00:12:37,960 --> 00:12:42,490 We wait until we think that the treatment has an effect 205 00:12:42,490 --> 00:12:43,820 and we remeasure. 206 00:12:43,820 --> 00:12:45,360 So, the way that we do this 207 00:12:45,360 --> 00:12:49,060 is we run a regression of course, 208 00:12:49,060 --> 00:12:54,060 where Y equals the property value, 209 00:12:56,040 --> 00:13:01,040 DT is our dummy variable for the treatment. 210 00:13:01,260 --> 00:13:06,260 So, that is the effect of the treatment independent of time. 211 00:13:07,120 --> 00:13:12,010 This Delta naught is the effect of the time 212 00:13:13,470 --> 00:13:15,670 independent of the treatment. 213 00:13:15,670 --> 00:13:16,730 And then finally, 214 00:13:16,730 --> 00:13:21,400 the real variable of interest is this Delta one 215 00:13:21,400 --> 00:13:24,970 where we are multiplying, 216 00:13:24,970 --> 00:13:27,950 D2 times DT. 217 00:13:27,950 --> 00:13:31,610 So, multiplying both treatment and effect. 218 00:13:31,610 --> 00:13:33,410 So, for everybody, 219 00:13:33,410 --> 00:13:38,410 except those who are in the neighborhood 220 00:13:38,650 --> 00:13:40,920 that gets the NIMBY 221 00:13:40,920 --> 00:13:43,220 and in time two that will be zero. 222 00:13:43,220 --> 00:13:46,750 So, this is seen as being the average treatment effect. 223 00:13:46,750 --> 00:13:50,610 And I'm asking you on the homework to do the math 224 00:13:50,610 --> 00:13:55,610 that shows that Delta hat is this equation. 225 00:13:55,860 --> 00:13:59,530 So, it's the Y bar one that you see there 226 00:13:59,530 --> 00:14:01,040 in the bullet points. 227 00:14:01,040 --> 00:14:05,000 So, it is the first part 228 00:14:05,000 --> 00:14:08,900 in parentheses is the change 229 00:14:08,900 --> 00:14:13,900 in the time over treatment 230 00:14:14,070 --> 00:14:18,790 and control, minus the second term, 231 00:14:18,790 --> 00:14:21,053 the change in, 232 00:14:25,460 --> 00:14:26,670 wait a minute, that's not right. 233 00:14:26,670 --> 00:14:27,503 I'm sorry. 234 00:14:27,503 --> 00:14:31,220 So, the first term is the difference 235 00:14:31,220 --> 00:14:34,160 in the two neighborhoods in period two. 236 00:14:34,160 --> 00:14:36,020 And the second term is the difference 237 00:14:36,020 --> 00:14:39,980 in the two neighborhoods in period one. 238 00:14:39,980 --> 00:14:41,673 So, I apologize for that. 239 00:14:42,740 --> 00:14:46,073 And I'm going to have you measure how that works. 240 00:14:49,960 --> 00:14:53,110 Now, we're going to panel data, 241 00:14:53,110 --> 00:14:54,510 and this is again, 242 00:14:54,510 --> 00:14:57,200 where we're not drawing a random sample each time 243 00:14:57,200 --> 00:15:01,130 we are measuring responses 244 00:15:01,130 --> 00:15:03,623 of the same people over time. 245 00:15:06,650 --> 00:15:11,650 So, one of the things that we have to do 246 00:15:12,780 --> 00:15:15,460 is account for the fact 247 00:15:15,460 --> 00:15:18,390 that we're not drawing a random sample. 248 00:15:18,390 --> 00:15:23,080 So, we use what's called the fixed effects model, 249 00:15:23,080 --> 00:15:24,270 and there's a few of them, 250 00:15:24,270 --> 00:15:28,390 and you're gonna see an example of one today, 251 00:15:28,390 --> 00:15:30,940 and then we're gonna do another one next time. 252 00:15:30,940 --> 00:15:35,940 So, it's very common that panel data 253 00:15:36,990 --> 00:15:40,400 are prone to missing variable bias. 254 00:15:40,400 --> 00:15:44,603 And they're also, as we'll see, very prone to endogeneity. 255 00:15:45,890 --> 00:15:47,600 And it's very... 256 00:15:47,600 --> 00:15:51,730 You can only ask for a certain number of questions 257 00:15:51,730 --> 00:15:52,890 on a survey. 258 00:15:52,890 --> 00:15:57,890 So, we're there's a danger of omitted variable bias. 259 00:15:58,010 --> 00:16:01,250 Lots of things go into the error term. 260 00:16:01,250 --> 00:16:04,940 And what we do in these fixed effect models 261 00:16:04,940 --> 00:16:09,940 is we break our error term into fixed 262 00:16:10,120 --> 00:16:13,810 and what we call idiosyncratic stochastic elements. 263 00:16:13,810 --> 00:16:18,560 So, these idiosyncratic stochastic elements 264 00:16:18,560 --> 00:16:20,330 are back to the beginning of the class. 265 00:16:20,330 --> 00:16:22,800 That's the bingo ball wheel, spin the wheel, 266 00:16:22,800 --> 00:16:24,280 draw an arrow term, 267 00:16:24,280 --> 00:16:27,950 all kinds of random things that may or may not happen 268 00:16:27,950 --> 00:16:29,980 to you, good and bad. 269 00:16:29,980 --> 00:16:31,280 The fixed ones 270 00:16:33,478 --> 00:16:37,220 are things that are really related 271 00:16:37,220 --> 00:16:40,120 to you being you. 272 00:16:40,120 --> 00:16:43,620 What are the factors that are really hard to measure 273 00:16:43,620 --> 00:16:46,650 that are just fixed and being you 274 00:16:46,650 --> 00:16:49,450 that every individual is different. 275 00:16:49,450 --> 00:16:54,450 And we can't measure every factor about every individual. 276 00:16:54,700 --> 00:16:58,130 So, we try to control for that. 277 00:16:58,130 --> 00:17:00,603 And we're gonna see how we do that next. 278 00:17:04,810 --> 00:17:09,810 So, here, is the classic fixed effects model. 279 00:17:10,270 --> 00:17:12,690 We have one regressor, 280 00:17:12,690 --> 00:17:17,690 that is the sort of the regressor of interest X one here. 281 00:17:20,450 --> 00:17:22,220 So, we model it as, 282 00:17:22,220 --> 00:17:25,750 and note that there's lots of I's and T's here. 283 00:17:25,750 --> 00:17:30,750 So, I is the Ith person or city. 284 00:17:30,790 --> 00:17:34,870 I'm going to show you an example of how this may be applied. 285 00:17:34,870 --> 00:17:36,903 So, hang on in the next slide. 286 00:17:38,020 --> 00:17:41,920 So, this assumes that we ask one, 287 00:17:41,920 --> 00:17:46,140 sorry, two questions on a survey. 288 00:17:46,140 --> 00:17:51,140 A Y question and an X and X one question. 289 00:17:51,250 --> 00:17:54,430 So, Y is our dependent variable, 290 00:17:54,430 --> 00:17:57,100 X is our independent variable, 291 00:17:57,100 --> 00:17:59,040 and those are the variables of interests 292 00:17:59,040 --> 00:18:00,623 that we want to model here. 293 00:18:01,560 --> 00:18:06,560 So, YIT is the Ith person's observation in time T. 294 00:18:10,160 --> 00:18:13,700 The D two is simply a dummy variable for time two. 295 00:18:13,700 --> 00:18:15,970 So, it's coded as one 296 00:18:15,970 --> 00:18:19,510 if these data came from the second time period 297 00:18:19,510 --> 00:18:22,893 and zero, if they came from the first time period. 298 00:18:24,010 --> 00:18:26,940 The really interesting thing here, 299 00:18:26,940 --> 00:18:28,463 is this AI. 300 00:18:29,340 --> 00:18:32,830 So, AI, or I'm just calling it A one here, 301 00:18:32,830 --> 00:18:37,390 is all those individual factors 302 00:18:37,390 --> 00:18:41,700 that affect the individual. 303 00:18:41,700 --> 00:18:44,530 It's sort of the unique attributes 304 00:18:44,530 --> 00:18:48,420 of a given person or place or time, 305 00:18:48,420 --> 00:18:52,890 and individual respondent that don't change over time. 306 00:18:52,890 --> 00:18:55,620 So, one way of sort of thinking about it 307 00:18:55,620 --> 00:18:59,060 is just like your genetic potential 308 00:18:59,060 --> 00:19:04,040 or if the unit of analysis is a city, 309 00:19:04,040 --> 00:19:07,680 just what makes Burlington, Burlington, 310 00:19:07,680 --> 00:19:09,540 and what makes Barry, Barry, 311 00:19:09,540 --> 00:19:11,893 and what makes Winooski Winooski. 312 00:19:13,923 --> 00:19:15,680 This is the fixed effect, 313 00:19:15,680 --> 00:19:17,550 it's the unobserved things, 314 00:19:17,550 --> 00:19:19,960 all those things that we really can't measure 315 00:19:19,960 --> 00:19:21,580 that makes Burlington, Burlington, 316 00:19:21,580 --> 00:19:23,600 and Winooski, Winooski 317 00:19:23,600 --> 00:19:27,720 that really don't change over time. 318 00:19:27,720 --> 00:19:32,380 But we assume that they really are highly correlated with X. 319 00:19:32,380 --> 00:19:35,563 Usually they are, so that what, 320 00:19:36,420 --> 00:19:41,420 or you can think of it as something like a dummy variable 321 00:19:43,010 --> 00:19:44,710 for person I. 322 00:19:44,710 --> 00:19:47,060 So, if we're looking at 323 00:19:48,250 --> 00:19:53,250 how if our Y is expenditure and our X is income, 324 00:19:53,410 --> 00:19:55,960 and we're following people over time 325 00:19:55,960 --> 00:19:59,760 and seeing how their expenditure on some good changes 326 00:19:59,760 --> 00:20:04,760 as X changes, that the AI is just what makes you, you. 327 00:20:06,180 --> 00:20:07,540 If you're a respondent, 328 00:20:07,540 --> 00:20:09,210 what are all those unique, 329 00:20:09,210 --> 00:20:13,283 really hard to measure things that just make you, you. 330 00:20:14,130 --> 00:20:16,650 So, if it's like controlling for you 331 00:20:16,650 --> 00:20:17,890 for what makes you, you 332 00:20:17,890 --> 00:20:21,820 and me, me, that's what we can think of as AI. 333 00:20:21,820 --> 00:20:26,820 And then the UIT is the bingo ball error, 334 00:20:27,740 --> 00:20:30,470 all those weird things that may or may not happen 335 00:20:30,470 --> 00:20:32,430 to you over time. 336 00:20:32,430 --> 00:20:35,380 And they're very much like the cross section errors 337 00:20:35,380 --> 00:20:36,840 that we've learned about 338 00:20:36,840 --> 00:20:39,793 that they have those same properties. 339 00:20:41,490 --> 00:20:42,813 So, here's an example, 340 00:20:46,560 --> 00:20:50,460 a food co-ops once to see how spending 341 00:20:50,460 --> 00:20:52,940 in their store changes 342 00:20:52,940 --> 00:20:57,940 by the household income and the household size. 343 00:20:59,330 --> 00:21:01,640 So, I think we would hypothesize 344 00:21:01,640 --> 00:21:05,150 that the slopes of both of these would be positive, 345 00:21:05,150 --> 00:21:08,950 that as folks earn more income, 346 00:21:08,950 --> 00:21:11,490 they're gonna spend a little bit more money there. 347 00:21:11,490 --> 00:21:16,010 And as their household size increases, 348 00:21:16,010 --> 00:21:17,770 they have more mouths to feed 349 00:21:17,770 --> 00:21:21,650 and they are going to spend more money. 350 00:21:21,650 --> 00:21:23,610 So, they did a survey. 351 00:21:23,610 --> 00:21:28,530 Again, this is fictional in 2012 and 2017. 352 00:21:28,530 --> 00:21:31,990 And they asked their members, these three questions 353 00:21:31,990 --> 00:21:36,990 or maybe they know because they could scan the card, 354 00:21:37,290 --> 00:21:40,520 what they spent and then they asked them, 355 00:21:40,520 --> 00:21:42,500 what's your annual income in that year? 356 00:21:42,500 --> 00:21:45,683 And what's your household size in that year? 357 00:21:47,800 --> 00:21:51,780 So, here, is fictional data 358 00:21:51,780 --> 00:21:56,780 of when members of this class took this survey. 359 00:21:58,520 --> 00:22:03,520 So, Connor, Carrie, Hons, Katie, me, everybody 360 00:22:05,430 --> 00:22:07,610 took this survey two times. 361 00:22:07,610 --> 00:22:12,610 So, this is how the data would look like in a data table. 362 00:22:13,040 --> 00:22:17,490 So, you see, the first two rows are the ways 363 00:22:17,490 --> 00:22:21,520 that Connor answered this question, 364 00:22:21,520 --> 00:22:26,130 you can see the, he did it in years, 2012 and 2017, 365 00:22:27,340 --> 00:22:30,470 that they measured his expenditure. 366 00:22:30,470 --> 00:22:34,690 He gave his household income and his household size, 367 00:22:34,690 --> 00:22:37,130 and you can see how this... 368 00:22:37,130 --> 00:22:41,210 So, the first number 369 00:22:41,210 --> 00:22:46,210 so, if you look at Connor's income in 2012, 370 00:22:46,740 --> 00:22:48,910 that's, X one, one, one 371 00:22:48,910 --> 00:22:53,910 so, look at the sub, the ITs and all of that. 372 00:22:55,740 --> 00:22:57,350 So, the X one is income 373 00:22:57,350 --> 00:22:58,673 that'll stay the same. 374 00:23:00,503 --> 00:23:05,503 The second one is that it is respondent one. 375 00:23:08,770 --> 00:23:13,720 And then the third, so you can see in Connor's rows, 376 00:23:13,720 --> 00:23:15,480 that goes from X one, one, one 377 00:23:15,480 --> 00:23:17,400 to X one, one, two, 378 00:23:17,400 --> 00:23:19,263 two would be time two. 379 00:23:22,380 --> 00:23:26,170 And it would be the same for everybody else. 380 00:23:26,170 --> 00:23:30,233 This is how the data would be presented and organized. 381 00:23:31,970 --> 00:23:35,490 So, when you look at the Xs 382 00:23:35,490 --> 00:23:40,490 the first subscript is for the variable number, 383 00:23:41,440 --> 00:23:42,973 the regressor number. 384 00:23:43,870 --> 00:23:47,710 The second one is for individual I. 385 00:23:47,710 --> 00:23:50,888 It goes with the person and the third subscript 386 00:23:50,888 --> 00:23:53,463 goes with the time period. 387 00:23:56,720 --> 00:24:01,010 So, we are again, looking at this same slide 388 00:24:01,010 --> 00:24:02,620 with the fixed effects model, 389 00:24:02,620 --> 00:24:04,930 now that you maybe have your mind around it. 390 00:24:04,930 --> 00:24:09,130 So, let's assume we're working with this City Market data. 391 00:24:09,130 --> 00:24:13,330 So, YIT is expenditure peak in the store, 392 00:24:13,330 --> 00:24:17,180 Beta naught as before is what would expenditure be 393 00:24:17,180 --> 00:24:19,930 if all the regressors were zero. 394 00:24:19,930 --> 00:24:24,930 DT or D two T is a dummy variable. 395 00:24:25,610 --> 00:24:28,470 It's one in year 2017, 396 00:24:28,470 --> 00:24:31,550 zero for all the data collected in 2012. 397 00:24:31,550 --> 00:24:35,140 And then we see these two regressors, 398 00:24:35,140 --> 00:24:40,140 X one and X two income and household size. 399 00:24:40,600 --> 00:24:44,500 And then everybody gets this AI, 400 00:24:44,500 --> 00:24:46,143 everybody has this. 401 00:24:47,030 --> 00:24:50,510 And you can see here in the bullets, 402 00:24:50,510 --> 00:24:53,720 it says A1 equals unobserved effect. 403 00:24:53,720 --> 00:24:55,140 That should be AI, 404 00:24:55,140 --> 00:24:57,650 but this is taking me so long. 405 00:24:57,650 --> 00:25:00,723 I don't want to go back and fix it, I apologize. 406 00:25:04,196 --> 00:25:07,780 So, this is the dummy variable for you. 407 00:25:07,780 --> 00:25:10,650 So, you took this survey twice. 408 00:25:10,650 --> 00:25:12,950 This is all the things that make you, you, 409 00:25:12,950 --> 00:25:16,190 all those things that you really can't measure very easily 410 00:25:16,190 --> 00:25:18,160 or just wouldn't fit on the survey, 411 00:25:18,160 --> 00:25:20,210 but they don't change over time. 412 00:25:20,210 --> 00:25:23,360 Like, where you like to shop 413 00:25:23,360 --> 00:25:26,060 and how hungry you are 414 00:25:26,060 --> 00:25:28,820 and lots and lots and lots of other things 415 00:25:28,820 --> 00:25:30,460 that you really can't measure. 416 00:25:30,460 --> 00:25:34,670 And then the UIT is roll the bingo ball wheel. 417 00:25:34,670 --> 00:25:36,670 What's your random error term 418 00:25:36,670 --> 00:25:39,023 based on what might happen or what not. 419 00:25:40,055 --> 00:25:44,090 You get a flat tire or your dog gets sick, 420 00:25:44,090 --> 00:25:46,103 or whatever. 421 00:25:50,455 --> 00:25:53,803 So, when we would actually model this, 422 00:25:54,820 --> 00:25:58,770 the error term is a composite error. 423 00:25:58,770 --> 00:26:02,100 And it's composed of these two parts, 424 00:26:02,100 --> 00:26:05,260 the AI, your individual 425 00:26:06,510 --> 00:26:08,320 sort of dummy variable, 426 00:26:08,320 --> 00:26:09,880 that what's makes you, you 427 00:26:12,150 --> 00:26:13,710 as well, is this UT and all the things 428 00:26:13,710 --> 00:26:17,580 that make you, you are almost certainly correlated 429 00:26:17,580 --> 00:26:20,990 with your household size and your income. 430 00:26:20,990 --> 00:26:24,148 So, if you don't include it, 431 00:26:24,148 --> 00:26:28,440 it means that the error term that we observe 432 00:26:28,440 --> 00:26:33,440 this VIT, it is going to be very highly correlated 433 00:26:33,840 --> 00:26:38,150 with your Xs. 434 00:26:38,150 --> 00:26:40,930 So, we have to control for it somehow 435 00:26:40,930 --> 00:26:43,840 it's a form of omitted, variable bias. 436 00:26:43,840 --> 00:26:46,770 It's called heterogeneity bias 437 00:26:46,770 --> 00:26:49,580 that the observed error 438 00:26:52,220 --> 00:26:56,363 is going to be highly correlated with your Xs. 439 00:26:58,060 --> 00:27:01,660 The good news is since AI 440 00:27:01,660 --> 00:27:04,140 is constant over time, 441 00:27:04,140 --> 00:27:05,910 it only varies by place, 442 00:27:05,910 --> 00:27:10,230 or it only varies in this City Market example by person. 443 00:27:10,230 --> 00:27:12,210 We can difference it out 444 00:27:12,210 --> 00:27:15,700 that we can subtract it out mathematically 445 00:27:15,700 --> 00:27:19,150 and look in your book to see the math 446 00:27:19,150 --> 00:27:20,800 of what we do here. 447 00:27:20,800 --> 00:27:25,800 But basically we take every individual's observation 448 00:27:29,360 --> 00:27:33,053 and we subtract time period two 449 00:27:34,640 --> 00:27:37,783 minus time period one. 450 00:27:41,720 --> 00:27:44,800 So again, this is where we're looking at, 451 00:27:44,800 --> 00:27:48,623 what's called the first difference equation. 452 00:27:49,720 --> 00:27:54,670 So, basically we transform all the variables, 453 00:27:54,670 --> 00:27:57,420 subtracting the value 454 00:27:57,420 --> 00:28:01,000 of the second time period 455 00:28:01,000 --> 00:28:03,860 from the first time period. 456 00:28:03,860 --> 00:28:06,950 So, in our City Market example, 457 00:28:06,950 --> 00:28:09,690 we take my why. 458 00:28:09,690 --> 00:28:13,390 That is what I spent in 2017 459 00:28:13,390 --> 00:28:17,340 and subtract what I spent in 2012. 460 00:28:17,340 --> 00:28:21,150 And that is this Delta YI, for me. 461 00:28:21,150 --> 00:28:23,070 And in the same way, 462 00:28:23,070 --> 00:28:24,060 the change 463 00:28:28,082 --> 00:28:29,547 in X for me, 464 00:28:32,440 --> 00:28:37,440 my income in 2017 minus my income in 2012 465 00:28:38,150 --> 00:28:39,443 is my Delta XY. 466 00:28:40,430 --> 00:28:43,830 So, one way of thinking about this, 467 00:28:43,830 --> 00:28:47,880 is now we're looking 468 00:28:47,880 --> 00:28:51,960 at my new Y 469 00:28:51,960 --> 00:28:54,640 if you will, is what is the change 470 00:28:54,640 --> 00:28:59,640 in what I spent and Beta one measures? 471 00:29:01,810 --> 00:29:03,430 What is the change 472 00:29:05,170 --> 00:29:10,170 in what I spent based on what is the slope of measuring 473 00:29:14,428 --> 00:29:15,780 where the numerator is the change in expenditure 474 00:29:23,610 --> 00:29:25,750 and the denominator of Beta one 475 00:29:25,750 --> 00:29:30,067 is the change in income. 476 00:29:32,190 --> 00:29:37,190 So, think about it as how does the change in income 477 00:29:38,110 --> 00:29:42,383 drive the change in expenditure? 478 00:29:43,800 --> 00:29:48,340 So, if I spend a lot, so you can think about 479 00:29:50,870 --> 00:29:54,570 how does those changes. 480 00:29:54,570 --> 00:29:57,650 And now we've basically just turned it into 481 00:29:57,650 --> 00:29:59,970 a cross section equation 482 00:29:59,970 --> 00:30:04,390 where we're looking at how does the change over time 483 00:30:04,390 --> 00:30:07,290 of one variable effect 484 00:30:08,236 --> 00:30:10,483 that change over time of another. 485 00:30:11,900 --> 00:30:13,760 One of the big assumptions here 486 00:30:13,760 --> 00:30:15,870 is the change in error term. 487 00:30:15,870 --> 00:30:20,470 So, the change in, I roll the dice, 488 00:30:20,470 --> 00:30:24,413 or I spin the bingo ball wheel this time, 489 00:30:26,680 --> 00:30:28,680 and until 2012 and 2017, 490 00:30:30,040 --> 00:30:32,210 how does that affect 491 00:30:32,210 --> 00:30:35,673 or rather that has to be uncorrelated with, 492 00:30:38,190 --> 00:30:41,650 the change in X, my change in expenditure. 493 00:30:41,650 --> 00:30:45,810 So, this is called strict endogeneity. 494 00:30:45,810 --> 00:30:49,060 The most important thing here perhaps 495 00:30:49,060 --> 00:30:52,810 is that since this AI is constant, 496 00:30:52,810 --> 00:30:56,903 it is subtracted out that the AI from 2017, 497 00:30:58,290 --> 00:31:02,490 that which makes me, me in 2017 doesn't change. 498 00:31:02,490 --> 00:31:06,530 And so, when you subtract AI in 2017, 499 00:31:08,350 --> 00:31:11,850 from my AI in 2012 it equals zero, 500 00:31:11,850 --> 00:31:14,040 it drops out of the equation. 501 00:31:14,040 --> 00:31:18,700 So, we just subtracted away and got rid of 502 00:31:18,700 --> 00:31:23,700 this really dangerous endogeneity that we know causes error. 503 00:31:24,780 --> 00:31:28,000 So, and then what we call this Beta one, 504 00:31:28,000 --> 00:31:29,960 and we can just run OLS on this. 505 00:31:29,960 --> 00:31:33,970 And then as long as all of our other assumptions are met, 506 00:31:33,970 --> 00:31:38,670 and it is, we call this the first difference decimators. 507 00:31:38,670 --> 00:31:43,317 So, this looks at how does the change in Y 508 00:31:45,370 --> 00:31:49,720 change as the change in X changes? 509 00:31:49,720 --> 00:31:52,350 I know this is a little bit confusing, 510 00:31:52,350 --> 00:31:54,993 but hopefully this makes sense. 511 00:31:54,993 --> 00:31:59,880 And because the math in PowerPoint is so messy, 512 00:31:59,880 --> 00:32:04,047 I really encourage you to go back and look at the textbook. 513 00:32:07,230 --> 00:32:11,310 I will also post the textbooks slides, 514 00:32:11,310 --> 00:32:15,563 which can do PowerPoint in a lot neater form. 515 00:32:19,680 --> 00:32:23,210 So, here are some factors that are needed, 516 00:32:23,210 --> 00:32:26,150 the requirements for this to work. 517 00:32:26,150 --> 00:32:29,640 So again, once we transform our data 518 00:32:29,640 --> 00:32:33,990 subtract time period two 519 00:32:33,990 --> 00:32:36,990 minus time period one 520 00:32:36,990 --> 00:32:41,897 for our Y and all our Xs, we can just run OLS. 521 00:32:43,440 --> 00:32:45,080 The biggest thing is, 522 00:32:45,080 --> 00:32:47,950 so, we still are being very careful 523 00:32:47,950 --> 00:32:50,810 about avoiding endogeneity. 524 00:32:50,810 --> 00:32:53,800 And so, the change in narrow term, 525 00:32:53,800 --> 00:32:58,800 Delta XI must be uncorrelated with the change in X. 526 00:33:00,720 --> 00:33:05,720 So that's much the same as the error term 527 00:33:05,730 --> 00:33:09,900 and Xs in cross sectional OLS. 528 00:33:09,900 --> 00:33:13,620 So, and we do it in the same way 529 00:33:13,620 --> 00:33:15,550 that we include enough Xs 530 00:33:15,550 --> 00:33:19,067 to make sure that all the error is truly random. 531 00:33:20,850 --> 00:33:24,580 So, we control for as much as we can. 532 00:33:24,580 --> 00:33:29,580 Note that we cannot use a lag dependent variable here, 533 00:33:31,027 --> 00:33:33,810 because in sometimes, 534 00:33:33,810 --> 00:33:38,350 when you're working with multiple time periods, 535 00:33:38,350 --> 00:33:42,810 we might think that last year's expenditure 536 00:33:42,810 --> 00:33:47,810 would be an important explanatory variable last year, 537 00:33:48,580 --> 00:33:52,660 as a regressor might be important in explaining this year, 538 00:33:52,660 --> 00:33:57,660 we cannot do that because last year's dependent variable 539 00:34:02,920 --> 00:34:05,223 has last year's error in it. 540 00:34:06,250 --> 00:34:11,250 And so, it will clearly populate that the error term 541 00:34:11,680 --> 00:34:13,403 in this year is, 542 00:34:14,460 --> 00:34:17,583 and all those other factors are uncorrelated. 543 00:34:21,460 --> 00:34:26,460 Another important factor is that Delta X must vary over time 544 00:34:28,070 --> 00:34:31,710 that if it is constant. 545 00:34:31,710 --> 00:34:35,630 So, if the X doesn't change over time, 546 00:34:35,630 --> 00:34:39,667 so, if at some sort of a dummy variable like female, 547 00:34:42,150 --> 00:34:47,150 and the person is a female in both times, 548 00:34:48,270 --> 00:34:50,150 then it will subtract out. 549 00:34:50,150 --> 00:34:51,690 You completely lose that. 550 00:34:51,690 --> 00:34:54,620 So, it's really important that we choose Xs 551 00:34:54,620 --> 00:34:56,910 that will vary over time. 552 00:34:56,910 --> 00:35:01,910 So, we can think about if the unit of analysis is a city. 553 00:35:04,170 --> 00:35:08,069 So, we couldn't use something like elevation 554 00:35:08,069 --> 00:35:12,540 or its area in square miles or things that don't change 555 00:35:12,540 --> 00:35:15,063 because they will simply subtract out. 556 00:35:16,320 --> 00:35:19,080 The other thing that we have to do 557 00:35:21,630 --> 00:35:24,960 it's required that we have homoscedasticity. 558 00:35:24,960 --> 00:35:27,990 So, we would test for it and adjust for it 559 00:35:27,990 --> 00:35:30,143 much the same as before. 560 00:35:31,030 --> 00:35:32,790 And this is another factor 561 00:35:32,790 --> 00:35:37,790 where if you're going to be using panel data, 562 00:35:38,260 --> 00:35:40,563 talk to your advisor, talk to me, 563 00:35:41,430 --> 00:35:43,860 and we can walk you through these kinds 564 00:35:43,860 --> 00:35:46,200 of homoscedasticity tests 565 00:35:46,200 --> 00:35:51,093 and adjustments (indistinct) this class. 566 00:35:53,050 --> 00:35:58,050 So, we can also difference more than two time periods, 567 00:35:58,490 --> 00:36:01,020 and that this is pretty straightforward 568 00:36:01,020 --> 00:36:05,650 when you only have two, but it might be that in our example, 569 00:36:05,650 --> 00:36:10,650 so, City Market also measured these variables in 2007. 570 00:36:12,630 --> 00:36:15,220 So, we have three time periods now, 571 00:36:15,220 --> 00:36:18,230 and here we would each equation. 572 00:36:21,130 --> 00:36:23,450 So, this that you see this YIT 573 00:36:24,750 --> 00:36:27,110 equals Delta one and all of that. 574 00:36:27,110 --> 00:36:31,260 Now, we're including dummy variables 575 00:36:31,260 --> 00:36:34,310 for periods two and three. 576 00:36:34,310 --> 00:36:35,203 So that would be, 577 00:36:36,040 --> 00:36:40,670 if now the new period one is 2007 578 00:36:40,670 --> 00:36:44,450 then D two is a dummy for 2012, 579 00:36:44,450 --> 00:36:48,060 D three is a dummy for 2017 vary. 580 00:36:48,060 --> 00:36:51,410 And we measure all of our Xs 581 00:36:51,410 --> 00:36:54,900 and we get much the same result as before. 582 00:36:54,900 --> 00:36:59,640 So now, that we still have the same number of N responses, 583 00:36:59,640 --> 00:37:04,640 but now we have T one, two, three in this case. 584 00:37:12,640 --> 00:37:17,510 So, now when we look at the intercepts, 585 00:37:17,510 --> 00:37:22,100 so, what would be the intercepts for times one, 586 00:37:22,100 --> 00:37:23,150 two and three. 587 00:37:23,150 --> 00:37:26,680 So, in time one, it's just this Delta one, 588 00:37:26,680 --> 00:37:30,290 in time two it's Delta one plus Delta two. 589 00:37:30,290 --> 00:37:33,623 And in time three, it's Delta one plus Delta three. 590 00:37:34,540 --> 00:37:38,920 And what we're really interested in 591 00:37:38,920 --> 00:37:43,920 is these Betas as with any kind of econometric analysis, 592 00:37:46,410 --> 00:37:48,033 we wanna know our Betas. 593 00:37:48,960 --> 00:37:51,630 All we do all along is to just get good Betas. 594 00:37:51,630 --> 00:37:53,510 All we want is good Betas. 595 00:37:53,510 --> 00:37:57,470 So, again, it's important to 596 00:37:57,470 --> 00:38:02,070 that there's strict exogeneity condition holds 597 00:38:03,076 --> 00:38:06,950 that all the co-variants of all individuals 598 00:38:06,950 --> 00:38:10,720 over all times is uncorrelated 599 00:38:12,820 --> 00:38:17,570 with all of the error terms for all individuals. 600 00:38:17,570 --> 00:38:22,433 Overall times, this is a strict exogeneity condition, 601 00:38:23,890 --> 00:38:28,890 which if violated, we'll make for biased Betas. 602 00:38:32,230 --> 00:38:35,853 Again, we need this strict exogeneity, 603 00:38:36,730 --> 00:38:40,193 a very important assumption here. 604 00:38:41,680 --> 00:38:46,680 So, now what we do with these three time periods 605 00:38:47,370 --> 00:38:51,167 is we do two differencing. 606 00:38:52,770 --> 00:38:57,770 So, first we take the equation of time two 607 00:38:57,870 --> 00:39:01,563 and subtract it from time one and get these Delta. 608 00:39:03,150 --> 00:39:07,520 What are the change in the Deltas, 609 00:39:07,520 --> 00:39:11,150 or what are the change in our time that dummy variables 610 00:39:11,150 --> 00:39:12,450 and most critically, 611 00:39:12,450 --> 00:39:15,190 what are the changes in our Ys? 612 00:39:15,190 --> 00:39:17,080 And what are our changes and Xs? 613 00:39:17,080 --> 00:39:20,670 And this is what we are regressing. 614 00:39:20,670 --> 00:39:25,670 So, to think back of our City Market example. 615 00:39:26,550 --> 00:39:31,310 So, if we're looking at time two minus time one, 616 00:39:31,310 --> 00:39:36,310 that's comparing the change from 2007 to 2012. 617 00:39:38,550 --> 00:39:43,550 So, Delta X is the change in the Delta X one, 618 00:39:44,460 --> 00:39:46,650 it's the change in income. 619 00:39:46,650 --> 00:39:49,970 Delta X two is the change in household size 620 00:39:49,970 --> 00:39:54,180 and Delta Y is the change in expenditure 621 00:39:54,180 --> 00:39:57,293 in between these two years. 622 00:39:58,290 --> 00:40:03,290 And then we have doing it for the second time period, 623 00:40:04,420 --> 00:40:06,820 do the same sort of subtracting 624 00:40:06,820 --> 00:40:11,820 for time three minus time two, which is 2017 and 2012. 625 00:40:12,210 --> 00:40:16,257 So, here, we subtract the expenditure in 2017 626 00:40:18,140 --> 00:40:21,580 minus expenditure in 2012 627 00:40:21,580 --> 00:40:24,603 for every individual to get Delta Y. 628 00:40:25,750 --> 00:40:28,410 And the same way we subtract 629 00:40:28,410 --> 00:40:30,947 for each individual the income, 630 00:40:34,597 --> 00:40:36,510 and in each of these years, 631 00:40:36,510 --> 00:40:39,263 and the household size in each of these years 632 00:40:39,263 --> 00:40:44,070 and save those variables as Delta. 633 00:40:44,070 --> 00:40:47,393 So, we have new variables here. 634 00:40:48,630 --> 00:40:50,370 As long as we do this, 635 00:40:50,370 --> 00:40:52,390 if all those assumptions 636 00:40:52,390 --> 00:40:54,050 that we've been working with so far, 637 00:40:54,050 --> 00:40:58,100 and if we have this strict exogeneity , 638 00:40:58,100 --> 00:41:00,840 then we can use all the things 639 00:41:00,840 --> 00:41:03,380 that we're most familiar with. 640 00:41:03,380 --> 00:41:08,380 Our last T tests, F tests, 641 00:41:08,460 --> 00:41:11,690 all the things that we've been working with so far, 642 00:41:11,690 --> 00:41:13,873 all work and it all holds. 643 00:41:17,220 --> 00:41:21,610 Note that there are a few things 644 00:41:21,610 --> 00:41:24,870 that we need to deal with that there's no true intercept 645 00:41:24,870 --> 00:41:26,420 in this model, 646 00:41:26,420 --> 00:41:30,503 because Beta naught is a constant and it drops out. 647 00:41:31,610 --> 00:41:36,610 It can also be a little bit problematic computing R squared. 648 00:41:37,160 --> 00:41:42,160 So, this is again, the time when, 649 00:41:42,540 --> 00:41:46,660 if you're going to be doing panel data in your thesis, 650 00:41:46,660 --> 00:41:48,530 in your research, talk to your advisor, 651 00:41:48,530 --> 00:41:50,933 talk to me, we'll walk you through it. 652 00:41:53,400 --> 00:41:58,400 So, we can do this many ways for more than two times. 653 00:42:01,640 --> 00:42:06,640 So, you simply include a dummy for each time period, 654 00:42:08,750 --> 00:42:10,930 past your first one. 655 00:42:10,930 --> 00:42:15,390 So, you're the very first time that you collect data. 656 00:42:15,390 --> 00:42:19,750 That is your sort of omitted case that your baseline, 657 00:42:19,750 --> 00:42:23,180 and for each other time, 658 00:42:23,180 --> 00:42:28,180 you simply create a dummy for that time period. 659 00:42:28,440 --> 00:42:30,460 That equal to one. 660 00:42:30,460 --> 00:42:33,010 If the data were collected that year zero, 661 00:42:33,010 --> 00:42:38,010 if not, ideally, we have what's called a balanced panel 662 00:42:38,440 --> 00:42:40,820 where we have N observations 663 00:42:40,820 --> 00:42:44,403 for everybody over T times. 664 00:42:45,900 --> 00:42:50,900 It's also very important that we test for serial correlation 665 00:42:51,950 --> 00:42:55,990 and for heteroscedasticity and doing it in that order 666 00:42:55,990 --> 00:42:59,579 and this serial correlation test 667 00:42:59,579 --> 00:43:04,579 we will learn about later on when we do times series. 668 00:43:11,210 --> 00:43:16,210 There is a set of pitfalls with using this technique. 669 00:43:16,820 --> 00:43:20,800 First, the strict exogeneity assumption 670 00:43:20,800 --> 00:43:24,400 is, as I say, strict, 671 00:43:24,400 --> 00:43:29,400 and it may or may not hold the two practical ones 672 00:43:29,610 --> 00:43:33,080 is there's much less variation in the Xs 673 00:43:33,080 --> 00:43:36,110 since you're just taking the difference. 674 00:43:36,110 --> 00:43:39,720 And as we know, less variation in X leads 675 00:43:39,720 --> 00:43:43,743 to a less efficient estimator, 676 00:43:44,710 --> 00:43:48,670 as well as, so for every individual, 677 00:43:48,670 --> 00:43:50,530 we lose one observation. 678 00:43:50,530 --> 00:43:54,740 So, if we have two time periods, 679 00:43:54,740 --> 00:43:59,740 we go from two and observations to an observations. 680 00:44:00,980 --> 00:44:02,870 Again, less information, 681 00:44:02,870 --> 00:44:05,360 and therefore a less efficient estimator. 682 00:44:05,360 --> 00:44:07,300 So, those are the downfalls, 683 00:44:07,300 --> 00:44:12,300 but these are the things that we do to get rid of the bias. 684 00:44:12,380 --> 00:44:15,280 And this is a theme we're gonna talk about 685 00:44:15,280 --> 00:44:18,570 through much of the rest of the semester, 686 00:44:18,570 --> 00:44:22,010 the trade-off of bias and efficiency. 687 00:44:22,010 --> 00:44:23,053 So, stay tuned.