1
00:00:01,110 --> 00:00:02,880
[Instructor] So we're going to look at
2
00:00:02,880 --> 00:00:05,430
a simple linear regression
3
00:00:05,430 --> 00:00:10,430
with a couple independent variables.
4
00:00:12,480 --> 00:00:17,460
So we're using the dataset
hprice or home price.
5
00:00:17,460 --> 00:00:21,060
So to do a regular linear regression,
6
00:00:21,060 --> 00:00:26,060
we go to the Analyze menu
to Regression and Linear.
7
00:00:31,380 --> 00:00:35,280
And you can see I already
have the dependent
8
00:00:35,280 --> 00:00:39,180
of the variable is the selling
price of the the homes.
9
00:00:39,180 --> 00:00:43,560
And the two independents are
the square feet of the homes
10
00:00:43,560 --> 00:00:45,603
and the number of bedrooms.
11
00:00:46,710 --> 00:00:50,040
So one thing that I want to show you
12
00:00:50,040 --> 00:00:55,040
is how to save the residuals
and the predicted value.
13
00:00:55,410 --> 00:00:57,873
So if you go to the Save menu,
14
00:00:59,730 --> 00:01:04,560
and you click on the
Unstandardized predicted,
15
00:01:04,560 --> 00:01:08,910
that's y-hat, and the
Unstandardized residuals,
16
00:01:08,910 --> 00:01:11,070
which we call u-hat,
17
00:01:11,070 --> 00:01:15,150
and hit Continue.
18
00:01:15,150 --> 00:01:20,150
And then it will give
you those values as well.
19
00:01:20,520 --> 00:01:22,660
So we hit OK
20
00:01:25,620 --> 00:01:27,960
and we get our results.
21
00:01:27,960 --> 00:01:32,960
So first of all, we see
R Squared is a fairly 63%
22
00:01:34,830 --> 00:01:38,973
of the variation is
explained by our regressors.
23
00:01:40,500 --> 00:01:45,500
And another way of seeing
that it is the ratio
24
00:01:45,570 --> 00:01:48,723
between this number and this number.
25
00:01:49,890 --> 00:01:53,973
So 63% is explained
26
00:01:55,140 --> 00:01:59,040
and about 37 is unexplained.
27
00:01:59,040 --> 00:02:04,040
You could also do the math that
this plus this equals this.
28
00:02:04,080 --> 00:02:09,080
So this is the explained sum of squares.
29
00:02:09,450 --> 00:02:12,270
This is the unexplained sum of squares.
30
00:02:12,270 --> 00:02:14,880
And this is the total sum of squares.
31
00:02:14,880 --> 00:02:19,620
You see that our overall
model is very significant
32
00:02:19,620 --> 00:02:23,130
with a p-value of less than 0.001
33
00:02:23,130 --> 00:02:25,140
and a nice big F stat.
34
00:02:25,140 --> 00:02:28,050
Just let's scroll down
and look at the results.
35
00:02:28,050 --> 00:02:29,760
So what we're seeing here
36
00:02:29,760 --> 00:02:34,760
is it's saying only the
square feet is significant.
37
00:02:37,020 --> 00:02:40,833
So for every additional square foot,
38
00:02:41,790 --> 00:02:45,420
looks like you gain about 12 cents
39
00:02:48,630 --> 00:02:50,130
or 13 cents,
40
00:02:50,130 --> 00:02:55,130
or yeah, and then, so
that's the results of that.
41
00:02:57,390 --> 00:03:01,320
The other thing that
I want to show you is,
42
00:03:01,320 --> 00:03:03,690
I'm going to make this small,
43
00:03:03,690 --> 00:03:07,440
and we now have two new variables.
44
00:03:07,440 --> 00:03:12,440
So Pre 1 is y-hat and Res 1 is u-hat.
45
00:03:13,170 --> 00:03:18,170
And you can see that price
equals y-hat plus u-hat.
46
00:03:21,570 --> 00:03:23,250
So this is the actual y,
47
00:03:23,250 --> 00:03:26,160
well, the actual value of it.
48
00:03:26,160 --> 00:03:28,110
This is the predicted value,
49
00:03:28,110 --> 00:03:31,593
and this is the residual.
50
00:03:33,090 --> 00:03:38,090
If you were to take the mean
of price and the mean of y-hat,
51
00:03:39,360 --> 00:03:40,980
they would be the same.
52
00:03:40,980 --> 00:03:45,980
And if you took either the
sum or the mean of u-hat,
53
00:03:46,680 --> 00:03:51,540
that would be, it would
both sum and mean to 0.
54
00:03:51,540 --> 00:03:53,463
So let's bring our data back up,
55
00:03:55,020 --> 00:04:00,020
and let's have another look and
let's do another regression.
56
00:04:00,060 --> 00:04:03,773
So I want to show you
that if add regressors,
57
00:04:06,840 --> 00:04:11,840
it's going to change the
value of all the betas.
58
00:04:12,870 --> 00:04:14,670
So we're going to do another one.
59
00:04:14,670 --> 00:04:17,913
We're going to go Regression, Linear.
60
00:04:18,930 --> 00:04:23,930
This time we're going to turn
off the y-hats and the u-hats.
61
00:04:24,840 --> 00:04:26,550
We really don't need them.
62
00:04:26,550 --> 00:04:29,070
And let's add a few more.
63
00:04:29,070 --> 00:04:33,957
So let's add lot size
and the assessed value.
64
00:04:39,240 --> 00:04:40,830
Yeah, lot size,
65
00:04:40,830 --> 00:04:44,190
and then whether or not it is a colonial.
66
00:04:44,190 --> 00:04:46,110
And I'm just picking out a few here
67
00:04:46,110 --> 00:04:48,540
just to show you what happens.
68
00:04:48,540 --> 00:04:53,313
So again, we hit OK.
69
00:04:54,150 --> 00:04:56,100
Note that when we add regressors,
70
00:04:56,100 --> 00:05:00,060
R Squared always gets bigger,
which is what happened.
71
00:05:00,060 --> 00:05:02,430
It's still highly significant.
72
00:05:02,430 --> 00:05:04,710
Our F stat is bigger.
73
00:05:04,710 --> 00:05:08,490
And note now that only the assess,
74
00:05:08,490 --> 00:05:11,970
so controlling for all
these other factors,
75
00:05:11,970 --> 00:05:13,830
only the assessed value.
76
00:05:13,830 --> 00:05:18,660
Note too, that if we take our beta
77
00:05:18,660 --> 00:05:21,000
and divide by our standard error,
78
00:05:21,000 --> 00:05:23,490
we always get our t.
79
00:05:23,490 --> 00:05:27,900
And that big t is small significance.
80
00:05:27,900 --> 00:05:30,690
So what we see here is explaining
81
00:05:30,690 --> 00:05:33,090
that the only one that matters here
82
00:05:33,090 --> 00:05:37,230
is the assessed value
based on the significance.
83
00:05:37,230 --> 00:05:41,193
So hopefully this all
makes sense and thank you.