WEBVTT
1
00:00:00.077 --> 00:00:00.910
(mouse clicking)
2
00:00:00.910 --> 00:00:02.430
Hello students and welcome
3
00:00:02.430 --> 00:00:05.640
to Biostat ER example nine, chapter seven.
4
00:00:05.640 --> 00:00:07.500
In this example, we will learn how
5
00:00:07.500 --> 00:00:09.750
to perform hypothesis testing
6
00:00:09.750 --> 00:00:12.180
by using multiple sample independent
7
00:00:12.180 --> 00:00:13.683
continuous outcome data.
8
00:00:14.850 --> 00:00:17.160
And for this purpose we will be using
9
00:00:17.160 --> 00:00:19.413
ANOVA, or analysis of variance.
10
00:00:20.400 --> 00:00:22.290
And for this example, I'm using problem
11
00:00:22.290 --> 00:00:23.700
14 from our textbook
12
00:00:23.700 --> 00:00:26.193
and as always, I'll read the problem first.
13
00:00:27.480 --> 00:00:28.890
Suppose a hypertension trial
14
00:00:28.890 --> 00:00:31.180
is mounted and 18 participants
15
00:00:39.120 --> 00:00:40.380
are randomly assigned
16
00:00:40.380 --> 00:00:42.900
to one of the comparison treatments.
17
00:00:42.900 --> 00:00:44.100
Each participant takes
18
00:00:44.100 --> 00:00:45.780
the assigned medication and his
19
00:00:45.780 --> 00:00:47.340
or her systolic blood pressure
20
00:00:47.340 --> 00:00:48.955
is recorded after six months,
21
00:00:48.955 --> 00:00:51.480
on the assigned treatment.
22
00:00:51.480 --> 00:00:54.930
The data are shown in table 7-58.
23
00:00:54.930 --> 00:00:56.550
Is there a significant difference
24
00:00:56.550 --> 00:00:57.870
in mean systolic blood
25
00:00:57.870 --> 00:01:00.390
pressure among treatments?
26
00:01:00.390 --> 00:01:05.390
Run the appropriate test at alpha equal to 0.05.
27
00:01:05.640 --> 00:01:06.870
Now as a first step,
28
00:01:06.870 --> 00:01:09.810
we are going to set up the hypothesis
29
00:01:09.810 --> 00:01:12.630
and determine level of significance.
30
00:01:12.630 --> 00:01:14.040
We have a continuous outcome
31
00:01:14.040 --> 00:01:15.060
which is systolic blood
32
00:01:15.060 --> 00:01:19.290
pressure measured in millimeter per mercury.
33
00:01:19.290 --> 00:01:20.580
We are comparing the difference
34
00:01:20.580 --> 00:01:22.050
in means between three groups
35
00:01:22.050 --> 00:01:24.780
so we have multiple sample independent
36
00:01:24.780 --> 00:01:26.313
continuous outcome data.
37
00:01:27.750 --> 00:01:29.280
This is an ANOVA test
38
00:01:29.280 --> 00:01:31.230
and the appropriate hypothesis
39
00:01:31.230 --> 00:01:33.780
for null will be all the means are equal
40
00:01:33.780 --> 00:01:37.320
and for the alternative, means are not equal.
41
00:01:37.320 --> 00:01:41.700
And the level of significance here is 0.05.
42
00:01:41.700 --> 00:01:43.260
In step two we will select
43
00:01:43.260 --> 00:01:45.360
the appropriate test statistic.
44
00:01:45.360 --> 00:01:46.440
For our type of data,
45
00:01:46.440 --> 00:01:49.713
the appropriate test statistics is the F statistics.
46
00:01:52.770 --> 00:01:55.560
In step three, we are going to set up the decision.
47
00:01:55.560 --> 00:01:59.040
Our level of significance is 0.05
48
00:01:59.040 --> 00:02:01.560
and our degrees of freedom one is two
49
00:02:01.560 --> 00:02:03.330
because our small K, our number
50
00:02:03.330 --> 00:02:05.280
of groups is three, we deduct one
51
00:02:05.280 --> 00:02:07.620
from that, we get two.
52
00:02:07.620 --> 00:02:10.260
Our degrees of freedom two is going to be 15
53
00:02:10.260 --> 00:02:12.840
because our big N is equal to 18.
54
00:02:12.840 --> 00:02:15.240
And then we deduct three, which is the number
55
00:02:15.240 --> 00:02:18.030
of groups, we get 15.
56
00:02:18.030 --> 00:02:20.290
So from table four in the appendix
57
00:02:22.200 --> 00:02:25.200
we get our F critical value
58
00:02:25.200 --> 00:02:26.640
and hence we will reject
59
00:02:26.640 --> 00:02:29.340
null hypothesis if our F calculated
60
00:02:29.340 --> 00:02:32.703
is greater than or equal to 3.68.
61
00:02:33.649 --> 00:02:34.830
Now in step four
62
00:02:34.830 --> 00:02:37.260
we will compute the test statistic.
63
00:02:37.260 --> 00:02:40.230
The data here are summarized in the following table.
64
00:02:40.230 --> 00:02:42.600
Basically the group means are provided
65
00:02:42.600 --> 00:02:46.110
and the group sample sizes are also listed again.
66
00:02:46.110 --> 00:02:48.210
If we pool N equal to 18,
67
00:02:48.210 --> 00:02:49.770
basically the entire number
68
00:02:49.770 --> 00:02:52.380
of observations, the overall mean
69
00:02:52.380 --> 00:02:55.890
or the grand mean will be 130.
70
00:02:55.890 --> 00:02:58.230
Now given that the sample size is equal
71
00:02:58.230 --> 00:02:59.790
N equal to six for each group,
72
00:02:59.790 --> 00:03:02.100
we can add the group means
73
00:03:02.100 --> 00:03:03.660
and divide it by 3
74
00:03:03.660 --> 00:03:06.183
and calculate the grand mean or overall mean.
75
00:03:07.350 --> 00:03:10.980
And I've shown the calculation here below.
76
00:03:10.980 --> 00:03:12.960
Now, if not equal sample size,
77
00:03:12.960 --> 00:03:14.550
we would add all the values
78
00:03:14.550 --> 00:03:15.450
which is provided
79
00:03:15.450 --> 00:03:17.910
in table 7-58 in our textbook
80
00:03:17.910 --> 00:03:19.920
and divide it by the big N
81
00:03:19.920 --> 00:03:22.323
to calculate the overall mean or grand mean.
82
00:03:25.260 --> 00:03:27.090
Now we will compute the SSB
83
00:03:27.090 --> 00:03:30.120
and the formula for this is given here.
84
00:03:30.120 --> 00:03:32.210
And SSB is basically the sum
85
00:03:32.210 --> 00:03:34.410
of the difference in the group mean
86
00:03:34.410 --> 00:03:36.990
and overall mean for all groups.
87
00:03:36.990 --> 00:03:41.710
So basically the formula here requires us to insert
88
00:03:42.780 --> 00:03:44.430
the sample size from each group
89
00:03:44.430 --> 00:03:48.030
and that is six and I have inserted that here.
90
00:03:48.030 --> 00:03:52.050
Then it requires us to insert X-barj,
91
00:03:52.050 --> 00:03:54.420
which is the mean from each group.
92
00:03:54.420 --> 00:03:57.480
So that is 122.7 for group one,
93
00:03:57.480 --> 00:04:02.480
146.2 for group two and 121.1 for group three.
94
00:04:03.420 --> 00:04:05.400
And I have inserted that here as well.
95
00:04:05.400 --> 00:04:06.780
And then the final thing we have
96
00:04:06.780 --> 00:04:09.420
to insert here is the X bar,
97
00:04:09.420 --> 00:04:11.670
which is the overall mean or grand mean.
98
00:04:11.670 --> 00:04:13.200
And that is 130
99
00:04:13.200 --> 00:04:15.930
and I have inserted that here as well.
100
00:04:15.930 --> 00:04:17.820
So after that, basically all we have
101
00:04:17.820 --> 00:04:20.820
to do is do the algebraic process.
102
00:04:20.820 --> 00:04:22.320
And I have shown you that here
103
00:04:22.320 --> 00:04:24.360
in a step by step fashion
104
00:04:24.360 --> 00:04:26.470
and you should be able to follow it very
105
00:04:29.190 --> 00:04:32.430
easily and I strongly encourage you
106
00:04:32.430 --> 00:04:34.410
to perform the calculation yourself.
107
00:04:34.410 --> 00:04:35.680
It is very important
108
00:04:36.870 --> 00:04:39.120
that you understand the process
109
00:04:39.120 --> 00:04:40.410
and you also want to see
110
00:04:40.410 --> 00:04:42.720
if you're getting the same values
111
00:04:42.720 --> 00:04:45.063
or numbers that I am providing here.
112
00:04:45.990 --> 00:04:48.660
Now in this example, SS total is not
113
00:04:48.660 --> 00:04:51.450
provided to us, hence we must calculate sums
114
00:04:51.450 --> 00:04:53.103
of square within or,
115
00:04:54.640 --> 00:04:57.570
sums of squares error, separately.
116
00:04:57.570 --> 00:05:00.300
And that is what we are going to do next.
117
00:05:00.300 --> 00:05:01.590
For this calculation,
118
00:05:01.590 --> 00:05:04.350
we will use the individual values, the group mean
119
00:05:04.350 --> 00:05:08.230
for each group and then add it for
120
00:05:09.270 --> 00:05:10.950
over all groups.
121
00:05:10.950 --> 00:05:13.560
So the first thing we have to do is go
122
00:05:13.560 --> 00:05:16.380
and set up the table here.
123
00:05:16.380 --> 00:05:18.270
And this is for the first group,
124
00:05:18.270 --> 00:05:19.960
which is the standard
125
00:05:23.430 --> 00:05:26.670
or the group that received this standard treatment.
126
00:05:26.670 --> 00:05:30.273
And here we have inserted the individual values here,
127
00:05:30.273 --> 00:05:31.106
124,
128
00:05:31.106 --> 00:05:31.939
111,
129
00:05:31.939 --> 00:05:32.772
133,
130
00:05:32.772 --> 00:05:33.605
125,
131
00:05:33.605 --> 00:05:34.438
128,
132
00:05:34.438 --> 00:05:35.271
115.
133
00:05:36.180 --> 00:05:37.440
Here in the middle column,
134
00:05:37.440 --> 00:05:40.770
what we are doing is deducting the group mean from each
135
00:05:40.770 --> 00:05:42.030
of the individual values.
136
00:05:42.030 --> 00:05:44.070
And this should come across pretty similar
137
00:05:44.070 --> 00:05:45.670
because we have done this before
138
00:05:48.100 --> 00:05:50.670
when we perform the calculation for standard deviation,
139
00:05:50.670 --> 00:05:52.950
it's a very similar calculation.
140
00:05:52.950 --> 00:05:55.770
So now we are going to do those subtractions
141
00:05:55.770 --> 00:05:58.155
and we get values like 1.3,
142
00:05:58.155 --> 00:05:58.988
11.7,
143
00:06:00.214 --> 00:06:01.047
10.3,
144
00:06:01.047 --> 00:06:01.880
2.3,
145
00:06:01.880 --> 00:06:02.716
5.3,
146
00:06:02.716 --> 00:06:04.133
and negative 7.7.
147
00:06:05.280 --> 00:06:07.890
So now when we add all these values,
148
00:06:07.890 --> 00:06:10.170
the sum should be equal to zero,
149
00:06:10.170 --> 00:06:12.150
but it is not, it is very close,
150
00:06:12.150 --> 00:06:14.010
it is -0.2.
151
00:06:14.010 --> 00:06:14.843
And why is that?
152
00:06:14.843 --> 00:06:17.070
That's because we have decimals here.
153
00:06:17.070 --> 00:06:18.120
Now in the third column
154
00:06:18.120 --> 00:06:19.485
what we are doing is we are taking
155
00:06:19.485 --> 00:06:20.610
the whole square of each
156
00:06:20.610 --> 00:06:23.670
of these values and finally we are adding them up
157
00:06:23.670 --> 00:06:25.770
to get our final value
158
00:06:25.770 --> 00:06:28.143
for this group, which is 337.34.
159
00:06:30.720 --> 00:06:32.790
We are going to do the exact same process
160
00:06:32.790 --> 00:06:33.990
for the placebo group
161
00:06:33.990 --> 00:06:35.760
or the group that received the placebo.
162
00:06:35.760 --> 00:06:38.490
We are going to put the individual values here.
163
00:06:38.490 --> 00:06:40.830
Then we are going to subtract the group mean,
164
00:06:40.830 --> 00:06:42.810
and then we are going to do the summation
165
00:06:42.810 --> 00:06:44.550
to see if we get a zero or not.
166
00:06:44.550 --> 00:06:46.590
Again, we are not going to get an exact zero
167
00:06:46.590 --> 00:06:48.090
because of decimals.
168
00:06:48.090 --> 00:06:50.040
Then we are going to square
169
00:06:50.040 --> 00:06:51.210
root the numbers in the third
170
00:06:51.210 --> 00:06:53.490
column and finally we are going to add them up.
171
00:06:53.490 --> 00:06:55.620
So again, it's an algebraic process,
172
00:06:55.620 --> 00:06:58.980
but I strongly encourage you to do it step by step
173
00:06:58.980 --> 00:07:00.630
to make sure you are getting the values
174
00:07:00.630 --> 00:07:02.910
that I'm providing here.
175
00:07:02.910 --> 00:07:03.900
Now is the last group,
176
00:07:03.900 --> 00:07:05.550
which is the new treatment.
177
00:07:05.550 --> 00:07:10.440
Again, here I have inserted all the values basically
178
00:07:10.440 --> 00:07:12.510
that were provided to us.
179
00:07:12.510 --> 00:07:15.690
And then in the middle column
180
00:07:15.690 --> 00:07:19.050
I have deducted the group mean from each
181
00:07:19.050 --> 00:07:20.580
of the individual values.
182
00:07:20.580 --> 00:07:22.770
So that is again provided here.
183
00:07:22.770 --> 00:07:25.830
And then in the third or the last column,
184
00:07:25.830 --> 00:07:27.063
I have performed the whole square
185
00:07:27.063 --> 00:07:29.820
and then I have added those values
186
00:07:29.820 --> 00:07:32.130
and here I have 124.
187
00:07:32.130 --> 00:07:34.620
Now I do wanna bring it to your attention here that
188
00:07:34.620 --> 00:07:37.950
because in this group we did not have any decimals,
189
00:07:37.950 --> 00:07:41.670
when we did sum up the values for the middle column,
190
00:07:41.670 --> 00:07:44.010
we actually got a perfect zero
191
00:07:44.010 --> 00:07:46.680
because we did not have any decimals.
192
00:07:46.680 --> 00:07:50.010
Now we are going to add all these values up
193
00:07:50.010 --> 00:07:51.210
as you can see here,
194
00:07:51.210 --> 00:07:54.400
and that will give us the final SSW
195
00:07:57.750 --> 00:08:00.690
that we need to move forward with our calculation.
196
00:08:00.690 --> 00:08:03.210
So now we are going to construct the ANOVA table.
197
00:08:03.210 --> 00:08:07.890
Here we have the sums of squares as well
198
00:08:07.890 --> 00:08:10.410
as the degrees of freedom and the mean squares.
199
00:08:10.410 --> 00:08:14.220
And again, I have inserted all these values here
200
00:08:14.220 --> 00:08:16.560
that we obtained so far from our calculation,
201
00:08:16.560 --> 00:08:19.320
2380.2
202
00:08:19.320 --> 00:08:23.160
846.2, degrees of freedom two and 15.
203
00:08:23.160 --> 00:08:25.410
And now we have to do a division.
204
00:08:25.410 --> 00:08:29.790
So 2380.2 will be divided by two
205
00:08:29.790 --> 00:08:34.740
and that will give us 1190.1
206
00:08:34.740 --> 00:08:38.160
and we have to take 846.2 divided by 15
207
00:08:38.160 --> 00:08:40.860
and that will give us 56.4.
208
00:08:40.860 --> 00:08:45.600
So finally we have to divide 1190.1 by 56.4,
209
00:08:46.620 --> 00:08:47.583
and from that, our F value,
210
00:08:50.490 --> 00:08:53.970
our F calculated will be 21.1.
211
00:08:53.970 --> 00:08:56.730
So now as part of our last step, we are going
212
00:08:56.730 --> 00:08:58.320
to draw a conclusion.
213
00:08:58.320 --> 00:09:00.180
So we have calculated
214
00:09:00.180 --> 00:09:04.290
that F is greater than 3.68.
215
00:09:04.290 --> 00:09:06.870
So we have a statistically significant evidence
216
00:09:06.870 --> 00:09:08.880
to reject the null hypothesis
217
00:09:08.880 --> 00:09:09.930
and conclude that there
218
00:09:09.930 --> 00:09:12.270
is a significant difference
219
00:09:12.270 --> 00:09:13.890
in the mean systolic blood pressure
220
00:09:13.890 --> 00:09:16.020
between the treatment groups
221
00:09:16.020 --> 00:09:18.453
with a 5% level of significance.
222
00:09:19.710 --> 00:09:21.840
Now, given that we have rejected
223
00:09:21.840 --> 00:09:22.920
the null hypothesis,
224
00:09:22.920 --> 00:09:24.331
we need to move forward in terms
225
00:09:24.331 --> 00:09:26.820
of determining which groups are different
226
00:09:26.820 --> 00:09:28.413
from each other, okay?
227
00:09:29.400 --> 00:09:32.070
Therefore, we will perform a post-hoc analysis.
228
00:09:32.070 --> 00:09:35.160
And for that purpose, I created a new video
229
00:09:35.160 --> 00:09:38.733
and I will see you in that next video.
230
00:09:43.028 --> 00:09:43.861
(mouse clicking)