WEBVTT 1 00:00:00.077 --> 00:00:00.910 (mouse clicking) 2 00:00:00.910 --> 00:00:02.430 Hello students and welcome 3 00:00:02.430 --> 00:00:05.640 to Biostat ER example nine, chapter seven. 4 00:00:05.640 --> 00:00:07.500 In this example, we will learn how 5 00:00:07.500 --> 00:00:09.750 to perform hypothesis testing 6 00:00:09.750 --> 00:00:12.180 by using multiple sample independent 7 00:00:12.180 --> 00:00:13.683 continuous outcome data. 8 00:00:14.850 --> 00:00:17.160 And for this purpose we will be using 9 00:00:17.160 --> 00:00:19.413 ANOVA, or analysis of variance. 10 00:00:20.400 --> 00:00:22.290 And for this example, I'm using problem 11 00:00:22.290 --> 00:00:23.700 14 from our textbook 12 00:00:23.700 --> 00:00:26.193 and as always, I'll read the problem first. 13 00:00:27.480 --> 00:00:28.890 Suppose a hypertension trial 14 00:00:28.890 --> 00:00:31.180 is mounted and 18 participants 15 00:00:39.120 --> 00:00:40.380 are randomly assigned 16 00:00:40.380 --> 00:00:42.900 to one of the comparison treatments. 17 00:00:42.900 --> 00:00:44.100 Each participant takes 18 00:00:44.100 --> 00:00:45.780 the assigned medication and his 19 00:00:45.780 --> 00:00:47.340 or her systolic blood pressure 20 00:00:47.340 --> 00:00:48.955 is recorded after six months, 21 00:00:48.955 --> 00:00:51.480 on the assigned treatment. 22 00:00:51.480 --> 00:00:54.930 The data are shown in table 7-58. 23 00:00:54.930 --> 00:00:56.550 Is there a significant difference 24 00:00:56.550 --> 00:00:57.870 in mean systolic blood 25 00:00:57.870 --> 00:01:00.390 pressure among treatments? 26 00:01:00.390 --> 00:01:05.390 Run the appropriate test at alpha equal to 0.05. 27 00:01:05.640 --> 00:01:06.870 Now as a first step, 28 00:01:06.870 --> 00:01:09.810 we are going to set up the hypothesis 29 00:01:09.810 --> 00:01:12.630 and determine level of significance. 30 00:01:12.630 --> 00:01:14.040 We have a continuous outcome 31 00:01:14.040 --> 00:01:15.060 which is systolic blood 32 00:01:15.060 --> 00:01:19.290 pressure measured in millimeter per mercury. 33 00:01:19.290 --> 00:01:20.580 We are comparing the difference 34 00:01:20.580 --> 00:01:22.050 in means between three groups 35 00:01:22.050 --> 00:01:24.780 so we have multiple sample independent 36 00:01:24.780 --> 00:01:26.313 continuous outcome data. 37 00:01:27.750 --> 00:01:29.280 This is an ANOVA test 38 00:01:29.280 --> 00:01:31.230 and the appropriate hypothesis 39 00:01:31.230 --> 00:01:33.780 for null will be all the means are equal 40 00:01:33.780 --> 00:01:37.320 and for the alternative, means are not equal. 41 00:01:37.320 --> 00:01:41.700 And the level of significance here is 0.05. 42 00:01:41.700 --> 00:01:43.260 In step two we will select 43 00:01:43.260 --> 00:01:45.360 the appropriate test statistic. 44 00:01:45.360 --> 00:01:46.440 For our type of data, 45 00:01:46.440 --> 00:01:49.713 the appropriate test statistics is the F statistics. 46 00:01:52.770 --> 00:01:55.560 In step three, we are going to set up the decision. 47 00:01:55.560 --> 00:01:59.040 Our level of significance is 0.05 48 00:01:59.040 --> 00:02:01.560 and our degrees of freedom one is two 49 00:02:01.560 --> 00:02:03.330 because our small K, our number 50 00:02:03.330 --> 00:02:05.280 of groups is three, we deduct one 51 00:02:05.280 --> 00:02:07.620 from that, we get two. 52 00:02:07.620 --> 00:02:10.260 Our degrees of freedom two is going to be 15 53 00:02:10.260 --> 00:02:12.840 because our big N is equal to 18. 54 00:02:12.840 --> 00:02:15.240 And then we deduct three, which is the number 55 00:02:15.240 --> 00:02:18.030 of groups, we get 15. 56 00:02:18.030 --> 00:02:20.290 So from table four in the appendix 57 00:02:22.200 --> 00:02:25.200 we get our F critical value 58 00:02:25.200 --> 00:02:26.640 and hence we will reject 59 00:02:26.640 --> 00:02:29.340 null hypothesis if our F calculated 60 00:02:29.340 --> 00:02:32.703 is greater than or equal to 3.68. 61 00:02:33.649 --> 00:02:34.830 Now in step four 62 00:02:34.830 --> 00:02:37.260 we will compute the test statistic. 63 00:02:37.260 --> 00:02:40.230 The data here are summarized in the following table. 64 00:02:40.230 --> 00:02:42.600 Basically the group means are provided 65 00:02:42.600 --> 00:02:46.110 and the group sample sizes are also listed again. 66 00:02:46.110 --> 00:02:48.210 If we pool N equal to 18, 67 00:02:48.210 --> 00:02:49.770 basically the entire number 68 00:02:49.770 --> 00:02:52.380 of observations, the overall mean 69 00:02:52.380 --> 00:02:55.890 or the grand mean will be 130. 70 00:02:55.890 --> 00:02:58.230 Now given that the sample size is equal 71 00:02:58.230 --> 00:02:59.790 N equal to six for each group, 72 00:02:59.790 --> 00:03:02.100 we can add the group means 73 00:03:02.100 --> 00:03:03.660 and divide it by 3 74 00:03:03.660 --> 00:03:06.183 and calculate the grand mean or overall mean. 75 00:03:07.350 --> 00:03:10.980 And I've shown the calculation here below. 76 00:03:10.980 --> 00:03:12.960 Now, if not equal sample size, 77 00:03:12.960 --> 00:03:14.550 we would add all the values 78 00:03:14.550 --> 00:03:15.450 which is provided 79 00:03:15.450 --> 00:03:17.910 in table 7-58 in our textbook 80 00:03:17.910 --> 00:03:19.920 and divide it by the big N 81 00:03:19.920 --> 00:03:22.323 to calculate the overall mean or grand mean. 82 00:03:25.260 --> 00:03:27.090 Now we will compute the SSB 83 00:03:27.090 --> 00:03:30.120 and the formula for this is given here. 84 00:03:30.120 --> 00:03:32.210 And SSB is basically the sum 85 00:03:32.210 --> 00:03:34.410 of the difference in the group mean 86 00:03:34.410 --> 00:03:36.990 and overall mean for all groups. 87 00:03:36.990 --> 00:03:41.710 So basically the formula here requires us to insert 88 00:03:42.780 --> 00:03:44.430 the sample size from each group 89 00:03:44.430 --> 00:03:48.030 and that is six and I have inserted that here. 90 00:03:48.030 --> 00:03:52.050 Then it requires us to insert X-barj, 91 00:03:52.050 --> 00:03:54.420 which is the mean from each group. 92 00:03:54.420 --> 00:03:57.480 So that is 122.7 for group one, 93 00:03:57.480 --> 00:04:02.480 146.2 for group two and 121.1 for group three. 94 00:04:03.420 --> 00:04:05.400 And I have inserted that here as well. 95 00:04:05.400 --> 00:04:06.780 And then the final thing we have 96 00:04:06.780 --> 00:04:09.420 to insert here is the X bar, 97 00:04:09.420 --> 00:04:11.670 which is the overall mean or grand mean. 98 00:04:11.670 --> 00:04:13.200 And that is 130 99 00:04:13.200 --> 00:04:15.930 and I have inserted that here as well. 100 00:04:15.930 --> 00:04:17.820 So after that, basically all we have 101 00:04:17.820 --> 00:04:20.820 to do is do the algebraic process. 102 00:04:20.820 --> 00:04:22.320 And I have shown you that here 103 00:04:22.320 --> 00:04:24.360 in a step by step fashion 104 00:04:24.360 --> 00:04:26.470 and you should be able to follow it very 105 00:04:29.190 --> 00:04:32.430 easily and I strongly encourage you 106 00:04:32.430 --> 00:04:34.410 to perform the calculation yourself. 107 00:04:34.410 --> 00:04:35.680 It is very important 108 00:04:36.870 --> 00:04:39.120 that you understand the process 109 00:04:39.120 --> 00:04:40.410 and you also want to see 110 00:04:40.410 --> 00:04:42.720 if you're getting the same values 111 00:04:42.720 --> 00:04:45.063 or numbers that I am providing here. 112 00:04:45.990 --> 00:04:48.660 Now in this example, SS total is not 113 00:04:48.660 --> 00:04:51.450 provided to us, hence we must calculate sums 114 00:04:51.450 --> 00:04:53.103 of square within or, 115 00:04:54.640 --> 00:04:57.570 sums of squares error, separately. 116 00:04:57.570 --> 00:05:00.300 And that is what we are going to do next. 117 00:05:00.300 --> 00:05:01.590 For this calculation, 118 00:05:01.590 --> 00:05:04.350 we will use the individual values, the group mean 119 00:05:04.350 --> 00:05:08.230 for each group and then add it for 120 00:05:09.270 --> 00:05:10.950 over all groups. 121 00:05:10.950 --> 00:05:13.560 So the first thing we have to do is go 122 00:05:13.560 --> 00:05:16.380 and set up the table here. 123 00:05:16.380 --> 00:05:18.270 And this is for the first group, 124 00:05:18.270 --> 00:05:19.960 which is the standard 125 00:05:23.430 --> 00:05:26.670 or the group that received this standard treatment. 126 00:05:26.670 --> 00:05:30.273 And here we have inserted the individual values here, 127 00:05:30.273 --> 00:05:31.106 124, 128 00:05:31.106 --> 00:05:31.939 111, 129 00:05:31.939 --> 00:05:32.772 133, 130 00:05:32.772 --> 00:05:33.605 125, 131 00:05:33.605 --> 00:05:34.438 128, 132 00:05:34.438 --> 00:05:35.271 115. 133 00:05:36.180 --> 00:05:37.440 Here in the middle column, 134 00:05:37.440 --> 00:05:40.770 what we are doing is deducting the group mean from each 135 00:05:40.770 --> 00:05:42.030 of the individual values. 136 00:05:42.030 --> 00:05:44.070 And this should come across pretty similar 137 00:05:44.070 --> 00:05:45.670 because we have done this before 138 00:05:48.100 --> 00:05:50.670 when we perform the calculation for standard deviation, 139 00:05:50.670 --> 00:05:52.950 it's a very similar calculation. 140 00:05:52.950 --> 00:05:55.770 So now we are going to do those subtractions 141 00:05:55.770 --> 00:05:58.155 and we get values like 1.3, 142 00:05:58.155 --> 00:05:58.988 11.7, 143 00:06:00.214 --> 00:06:01.047 10.3, 144 00:06:01.047 --> 00:06:01.880 2.3, 145 00:06:01.880 --> 00:06:02.716 5.3, 146 00:06:02.716 --> 00:06:04.133 and negative 7.7. 147 00:06:05.280 --> 00:06:07.890 So now when we add all these values, 148 00:06:07.890 --> 00:06:10.170 the sum should be equal to zero, 149 00:06:10.170 --> 00:06:12.150 but it is not, it is very close, 150 00:06:12.150 --> 00:06:14.010 it is -0.2. 151 00:06:14.010 --> 00:06:14.843 And why is that? 152 00:06:14.843 --> 00:06:17.070 That's because we have decimals here. 153 00:06:17.070 --> 00:06:18.120 Now in the third column 154 00:06:18.120 --> 00:06:19.485 what we are doing is we are taking 155 00:06:19.485 --> 00:06:20.610 the whole square of each 156 00:06:20.610 --> 00:06:23.670 of these values and finally we are adding them up 157 00:06:23.670 --> 00:06:25.770 to get our final value 158 00:06:25.770 --> 00:06:28.143 for this group, which is 337.34. 159 00:06:30.720 --> 00:06:32.790 We are going to do the exact same process 160 00:06:32.790 --> 00:06:33.990 for the placebo group 161 00:06:33.990 --> 00:06:35.760 or the group that received the placebo. 162 00:06:35.760 --> 00:06:38.490 We are going to put the individual values here. 163 00:06:38.490 --> 00:06:40.830 Then we are going to subtract the group mean, 164 00:06:40.830 --> 00:06:42.810 and then we are going to do the summation 165 00:06:42.810 --> 00:06:44.550 to see if we get a zero or not. 166 00:06:44.550 --> 00:06:46.590 Again, we are not going to get an exact zero 167 00:06:46.590 --> 00:06:48.090 because of decimals. 168 00:06:48.090 --> 00:06:50.040 Then we are going to square 169 00:06:50.040 --> 00:06:51.210 root the numbers in the third 170 00:06:51.210 --> 00:06:53.490 column and finally we are going to add them up. 171 00:06:53.490 --> 00:06:55.620 So again, it's an algebraic process, 172 00:06:55.620 --> 00:06:58.980 but I strongly encourage you to do it step by step 173 00:06:58.980 --> 00:07:00.630 to make sure you are getting the values 174 00:07:00.630 --> 00:07:02.910 that I'm providing here. 175 00:07:02.910 --> 00:07:03.900 Now is the last group, 176 00:07:03.900 --> 00:07:05.550 which is the new treatment. 177 00:07:05.550 --> 00:07:10.440 Again, here I have inserted all the values basically 178 00:07:10.440 --> 00:07:12.510 that were provided to us. 179 00:07:12.510 --> 00:07:15.690 And then in the middle column 180 00:07:15.690 --> 00:07:19.050 I have deducted the group mean from each 181 00:07:19.050 --> 00:07:20.580 of the individual values. 182 00:07:20.580 --> 00:07:22.770 So that is again provided here. 183 00:07:22.770 --> 00:07:25.830 And then in the third or the last column, 184 00:07:25.830 --> 00:07:27.063 I have performed the whole square 185 00:07:27.063 --> 00:07:29.820 and then I have added those values 186 00:07:29.820 --> 00:07:32.130 and here I have 124. 187 00:07:32.130 --> 00:07:34.620 Now I do wanna bring it to your attention here that 188 00:07:34.620 --> 00:07:37.950 because in this group we did not have any decimals, 189 00:07:37.950 --> 00:07:41.670 when we did sum up the values for the middle column, 190 00:07:41.670 --> 00:07:44.010 we actually got a perfect zero 191 00:07:44.010 --> 00:07:46.680 because we did not have any decimals. 192 00:07:46.680 --> 00:07:50.010 Now we are going to add all these values up 193 00:07:50.010 --> 00:07:51.210 as you can see here, 194 00:07:51.210 --> 00:07:54.400 and that will give us the final SSW 195 00:07:57.750 --> 00:08:00.690 that we need to move forward with our calculation. 196 00:08:00.690 --> 00:08:03.210 So now we are going to construct the ANOVA table. 197 00:08:03.210 --> 00:08:07.890 Here we have the sums of squares as well 198 00:08:07.890 --> 00:08:10.410 as the degrees of freedom and the mean squares. 199 00:08:10.410 --> 00:08:14.220 And again, I have inserted all these values here 200 00:08:14.220 --> 00:08:16.560 that we obtained so far from our calculation, 201 00:08:16.560 --> 00:08:19.320 2380.2 202 00:08:19.320 --> 00:08:23.160 846.2, degrees of freedom two and 15. 203 00:08:23.160 --> 00:08:25.410 And now we have to do a division. 204 00:08:25.410 --> 00:08:29.790 So 2380.2 will be divided by two 205 00:08:29.790 --> 00:08:34.740 and that will give us 1190.1 206 00:08:34.740 --> 00:08:38.160 and we have to take 846.2 divided by 15 207 00:08:38.160 --> 00:08:40.860 and that will give us 56.4. 208 00:08:40.860 --> 00:08:45.600 So finally we have to divide 1190.1 by 56.4, 209 00:08:46.620 --> 00:08:47.583 and from that, our F value, 210 00:08:50.490 --> 00:08:53.970 our F calculated will be 21.1. 211 00:08:53.970 --> 00:08:56.730 So now as part of our last step, we are going 212 00:08:56.730 --> 00:08:58.320 to draw a conclusion. 213 00:08:58.320 --> 00:09:00.180 So we have calculated 214 00:09:00.180 --> 00:09:04.290 that F is greater than 3.68. 215 00:09:04.290 --> 00:09:06.870 So we have a statistically significant evidence 216 00:09:06.870 --> 00:09:08.880 to reject the null hypothesis 217 00:09:08.880 --> 00:09:09.930 and conclude that there 218 00:09:09.930 --> 00:09:12.270 is a significant difference 219 00:09:12.270 --> 00:09:13.890 in the mean systolic blood pressure 220 00:09:13.890 --> 00:09:16.020 between the treatment groups 221 00:09:16.020 --> 00:09:18.453 with a 5% level of significance. 222 00:09:19.710 --> 00:09:21.840 Now, given that we have rejected 223 00:09:21.840 --> 00:09:22.920 the null hypothesis, 224 00:09:22.920 --> 00:09:24.331 we need to move forward in terms 225 00:09:24.331 --> 00:09:26.820 of determining which groups are different 226 00:09:26.820 --> 00:09:28.413 from each other, okay? 227 00:09:29.400 --> 00:09:32.070 Therefore, we will perform a post-hoc analysis. 228 00:09:32.070 --> 00:09:35.160 And for that purpose, I created a new video 229 00:09:35.160 --> 00:09:38.733 and I will see you in that next video. 230 00:09:43.028 --> 00:09:43.861 (mouse clicking)