1 00:00:01,110 --> 00:00:02,880 [Instructor] So we're going to look at 2 00:00:02,880 --> 00:00:05,430 a simple linear regression 3 00:00:05,430 --> 00:00:10,430 with a couple independent variables. 4 00:00:12,480 --> 00:00:17,460 So we're using the dataset hprice or home price. 5 00:00:17,460 --> 00:00:21,060 So to do a regular linear regression, 6 00:00:21,060 --> 00:00:26,060 we go to the Analyze menu to Regression and Linear. 7 00:00:31,380 --> 00:00:35,280 And you can see I already have the dependent 8 00:00:35,280 --> 00:00:39,180 of the variable is the selling price of the the homes. 9 00:00:39,180 --> 00:00:43,560 And the two independents are the square feet of the homes 10 00:00:43,560 --> 00:00:45,603 and the number of bedrooms. 11 00:00:46,710 --> 00:00:50,040 So one thing that I want to show you 12 00:00:50,040 --> 00:00:55,040 is how to save the residuals and the predicted value. 13 00:00:55,410 --> 00:00:57,873 So if you go to the Save menu, 14 00:00:59,730 --> 00:01:04,560 and you click on the Unstandardized predicted, 15 00:01:04,560 --> 00:01:08,910 that's y-hat, and the Unstandardized residuals, 16 00:01:08,910 --> 00:01:11,070 which we call u-hat, 17 00:01:11,070 --> 00:01:15,150 and hit Continue. 18 00:01:15,150 --> 00:01:20,150 And then it will give you those values as well. 19 00:01:20,520 --> 00:01:22,660 So we hit OK 20 00:01:25,620 --> 00:01:27,960 and we get our results. 21 00:01:27,960 --> 00:01:32,960 So first of all, we see R Squared is a fairly 63% 22 00:01:34,830 --> 00:01:38,973 of the variation is explained by our regressors. 23 00:01:40,500 --> 00:01:45,500 And another way of seeing that it is the ratio 24 00:01:45,570 --> 00:01:48,723 between this number and this number. 25 00:01:49,890 --> 00:01:53,973 So 63% is explained 26 00:01:55,140 --> 00:01:59,040 and about 37 is unexplained. 27 00:01:59,040 --> 00:02:04,040 You could also do the math that this plus this equals this. 28 00:02:04,080 --> 00:02:09,080 So this is the explained sum of squares. 29 00:02:09,450 --> 00:02:12,270 This is the unexplained sum of squares. 30 00:02:12,270 --> 00:02:14,880 And this is the total sum of squares. 31 00:02:14,880 --> 00:02:19,620 You see that our overall model is very significant 32 00:02:19,620 --> 00:02:23,130 with a p-value of less than 0.001 33 00:02:23,130 --> 00:02:25,140 and a nice big F stat. 34 00:02:25,140 --> 00:02:28,050 Just let's scroll down and look at the results. 35 00:02:28,050 --> 00:02:29,760 So what we're seeing here 36 00:02:29,760 --> 00:02:34,760 is it's saying only the square feet is significant. 37 00:02:37,020 --> 00:02:40,833 So for every additional square foot, 38 00:02:41,790 --> 00:02:45,420 looks like you gain about 12 cents 39 00:02:48,630 --> 00:02:50,130 or 13 cents, 40 00:02:50,130 --> 00:02:55,130 or yeah, and then, so that's the results of that. 41 00:02:57,390 --> 00:03:01,320 The other thing that I want to show you is, 42 00:03:01,320 --> 00:03:03,690 I'm going to make this small, 43 00:03:03,690 --> 00:03:07,440 and we now have two new variables. 44 00:03:07,440 --> 00:03:12,440 So Pre 1 is y-hat and Res 1 is u-hat. 45 00:03:13,170 --> 00:03:18,170 And you can see that price equals y-hat plus u-hat. 46 00:03:21,570 --> 00:03:23,250 So this is the actual y, 47 00:03:23,250 --> 00:03:26,160 well, the actual value of it. 48 00:03:26,160 --> 00:03:28,110 This is the predicted value, 49 00:03:28,110 --> 00:03:31,593 and this is the residual. 50 00:03:33,090 --> 00:03:38,090 If you were to take the mean of price and the mean of y-hat, 51 00:03:39,360 --> 00:03:40,980 they would be the same. 52 00:03:40,980 --> 00:03:45,980 And if you took either the sum or the mean of u-hat, 53 00:03:46,680 --> 00:03:51,540 that would be, it would both sum and mean to 0. 54 00:03:51,540 --> 00:03:53,463 So let's bring our data back up, 55 00:03:55,020 --> 00:04:00,020 and let's have another look and let's do another regression. 56 00:04:00,060 --> 00:04:03,773 So I want to show you that if add regressors, 57 00:04:06,840 --> 00:04:11,840 it's going to change the value of all the betas. 58 00:04:12,870 --> 00:04:14,670 So we're going to do another one. 59 00:04:14,670 --> 00:04:17,913 We're going to go Regression, Linear. 60 00:04:18,930 --> 00:04:23,930 This time we're going to turn off the y-hats and the u-hats. 61 00:04:24,840 --> 00:04:26,550 We really don't need them. 62 00:04:26,550 --> 00:04:29,070 And let's add a few more. 63 00:04:29,070 --> 00:04:33,957 So let's add lot size and the assessed value. 64 00:04:39,240 --> 00:04:40,830 Yeah, lot size, 65 00:04:40,830 --> 00:04:44,190 and then whether or not it is a colonial. 66 00:04:44,190 --> 00:04:46,110 And I'm just picking out a few here 67 00:04:46,110 --> 00:04:48,540 just to show you what happens. 68 00:04:48,540 --> 00:04:53,313 So again, we hit OK. 69 00:04:54,150 --> 00:04:56,100 Note that when we add regressors, 70 00:04:56,100 --> 00:05:00,060 R Squared always gets bigger, which is what happened. 71 00:05:00,060 --> 00:05:02,430 It's still highly significant. 72 00:05:02,430 --> 00:05:04,710 Our F stat is bigger. 73 00:05:04,710 --> 00:05:08,490 And note now that only the assess, 74 00:05:08,490 --> 00:05:11,970 so controlling for all these other factors, 75 00:05:11,970 --> 00:05:13,830 only the assessed value. 76 00:05:13,830 --> 00:05:18,660 Note too, that if we take our beta 77 00:05:18,660 --> 00:05:21,000 and divide by our standard error, 78 00:05:21,000 --> 00:05:23,490 we always get our t. 79 00:05:23,490 --> 00:05:27,900 And that big t is small significance. 80 00:05:27,900 --> 00:05:30,690 So what we see here is explaining 81 00:05:30,690 --> 00:05:33,090 that the only one that matters here 82 00:05:33,090 --> 00:05:37,230 is the assessed value based on the significance. 83 00:05:37,230 --> 00:05:41,193 So hopefully this all makes sense and thank you.