WEBVTT 1 00:00:02.250 --> 00:00:03.090 Hello students. 2 00:00:03.090 --> 00:00:06.690 Welcome to Biostats ER, example six, chapter seven. 3 00:00:06.690 --> 00:00:09.300 In this example we will learn how to perform 4 00:00:09.300 --> 00:00:12.720 hypothesis testing for categorical outcome 5 00:00:12.720 --> 00:00:14.643 when we have two or more samples. 6 00:00:18.780 --> 00:00:20.730 We are going to use problem 29 7 00:00:20.730 --> 00:00:22.650 from our textbook for this example. 8 00:00:22.650 --> 00:00:25.383 And first, I will read the problem. 9 00:00:26.407 --> 00:00:29.370 "Some scientists believe that alcoholism is linked 10 00:00:29.370 --> 00:00:30.810 to social isolation. 11 00:00:30.810 --> 00:00:34.410 One measure of social isolation is marital status. 12 00:00:34.410 --> 00:00:37.500 A study of 280 adults is conducted 13 00:00:37.500 --> 00:00:41.580 and each participant is classified as not alcoholic, 14 00:00:41.580 --> 00:00:44.970 diagnosed alcoholic, and undiagnosed alcoholic, 15 00:00:44.970 --> 00:00:47.040 and categorized by marital status. 16 00:00:47.040 --> 00:00:50.160 Is there significant evidence of an association? 17 00:00:50.160 --> 00:00:54.810 Run the appropriate test at a 5% level of significance." 18 00:00:54.810 --> 00:00:59.280 So, here we have inserted the independent variable, 19 00:00:59.280 --> 00:01:01.380 or the exposure, which is marital status, 20 00:01:01.380 --> 00:01:03.600 Married, Not Married, as our rows, 21 00:01:03.600 --> 00:01:06.270 and our outcome, or dependent variable, 22 00:01:06.270 --> 00:01:08.280 which is the three states, 23 00:01:08.280 --> 00:01:10.830 Diagnosed Alcoholic, Undiagnosed Alcoholic, 24 00:01:10.830 --> 00:01:13.800 and Not Alcoholic, as the columns. 25 00:01:13.800 --> 00:01:15.960 So, this first step for us here 26 00:01:15.960 --> 00:01:17.820 would be to set up the hypothesis 27 00:01:17.820 --> 00:01:20.430 and determine the level of significance. 28 00:01:20.430 --> 00:01:25.430 The null hypothesis here is stated as marital status 29 00:01:26.130 --> 00:01:28.470 and alcoholism are independent, 30 00:01:28.470 --> 00:01:32.340 and the alternative basically states that it is false. 31 00:01:32.340 --> 00:01:35.920 The level of significance, as provided in our example 32 00:01:37.170 --> 00:01:39.933 from our text, is 0.05. 33 00:01:40.800 --> 00:01:42.240 So, now we are going to select 34 00:01:42.240 --> 00:01:44.040 the appropriate test statistics, 35 00:01:44.040 --> 00:01:46.200 and for this that would be a chi-square. 36 00:01:46.200 --> 00:01:49.620 And the formula here is provided for chi-square. 37 00:01:49.620 --> 00:01:52.260 Now, chi-square, again, is a new test we are learning 38 00:01:52.260 --> 00:01:56.380 in this chapter, and for us to use chi-square 39 00:01:58.020 --> 00:02:00.390 the condition has to be met. 40 00:02:00.390 --> 00:02:01.750 And that condition is 41 00:02:06.570 --> 00:02:09.120 for appropriate use of this test statistic, 42 00:02:09.120 --> 00:02:12.870 which is chi-square, is that each expected frequency 43 00:02:12.870 --> 00:02:14.760 is at least five. 44 00:02:14.760 --> 00:02:18.420 In step four we will compute the expected frequencies, 45 00:02:18.420 --> 00:02:21.720 and we will ensure that the condition is met. 46 00:02:21.720 --> 00:02:25.080 So, the next step for us here 47 00:02:25.080 --> 00:02:26.940 would be to set up our decision rule, 48 00:02:26.940 --> 00:02:31.290 and that'll be degrees of freedom equal to r minus one 49 00:02:31.290 --> 00:02:32.910 multiplied by c minus one. 50 00:02:32.910 --> 00:02:37.080 So, the r here is the row, and the c here is the column. 51 00:02:37.080 --> 00:02:40.710 So, we have two rows, so it's going to be two minus one, 52 00:02:40.710 --> 00:02:41.700 and we have three columns, 53 00:02:41.700 --> 00:02:43.590 so it's going to be three minus one. 54 00:02:43.590 --> 00:02:47.400 And when we complete the subtraction 55 00:02:47.400 --> 00:02:50.040 and do the multiplication we get two. 56 00:02:50.040 --> 00:02:52.830 And from the back of our book, from the chi-square table, 57 00:02:52.830 --> 00:02:57.510 now we obtain that we will reject null 58 00:02:57.510 --> 00:02:59.520 if our chi-square calculated 59 00:02:59.520 --> 00:03:03.123 is greater than or equal to 5.99. 60 00:03:04.650 --> 00:03:07.470 With that we are going to proceed to step four. 61 00:03:07.470 --> 00:03:10.293 Now we are going to compute the test statistics, 62 00:03:12.480 --> 00:03:15.150 and for that the formula that we will use 63 00:03:15.150 --> 00:03:18.420 is expected frequency equal to row total 64 00:03:18.420 --> 00:03:21.123 multiplied by column total divided by big N. 65 00:03:22.170 --> 00:03:24.570 The top number in each cell of the table 66 00:03:24.570 --> 00:03:26.940 is the observed frequency, 67 00:03:26.940 --> 00:03:29.910 and the bottom number is the expected frequency, 68 00:03:29.910 --> 00:03:32.100 which is shown in parentheses. 69 00:03:32.100 --> 00:03:36.030 So, now I'm going to do some of these calculations with you 70 00:03:36.030 --> 00:03:37.980 so you kind of see step by step 71 00:03:37.980 --> 00:03:41.070 how we are obtaining the expected frequencies. 72 00:03:41.070 --> 00:03:42.720 So, first, as you can see, 73 00:03:42.720 --> 00:03:44.700 the formula here is straightforward. 74 00:03:44.700 --> 00:03:47.850 It is row total multiplied by column total 75 00:03:47.850 --> 00:03:51.510 divided by the big N, and that is 280. 76 00:03:51.510 --> 00:03:56.510 So, now we will pull up our calculator, 77 00:03:56.670 --> 00:04:01.670 and we are going to multiply 116 by 80 78 00:04:01.740 --> 00:04:05.580 and divide that by 280. 79 00:04:05.580 --> 00:04:10.380 And once we do that we get the value here, which is 33.1. 80 00:04:10.380 --> 00:04:12.330 I'm going to do one more here, 81 00:04:12.330 --> 00:04:14.820 and I'm going to again take 116 82 00:04:14.820 --> 00:04:19.053 and multiply it by the column total, which is now 100. 83 00:04:19.890 --> 00:04:24.890 And now we get 11,600, we divide that by 280, 84 00:04:25.410 --> 00:04:28.590 and we get 41.4. 85 00:04:28.590 --> 00:04:31.530 So, as you can see, this is kind of the same process 86 00:04:31.530 --> 00:04:36.530 we have to repeat to get all the expected frequencies here. 87 00:04:36.840 --> 00:04:39.480 And given our expected frequencies here 88 00:04:39.480 --> 00:04:43.830 are greater than five, or at least five, as you can see, 89 00:04:43.830 --> 00:04:47.400 we can now proceed and use chi-square statistics. 90 00:04:47.400 --> 00:04:49.860 So, the test statistics is computed as follows, 91 00:04:49.860 --> 00:04:53.340 and this is the formula that was presented in step two. 92 00:04:53.340 --> 00:04:54.990 As you can see, this is the big E, 93 00:04:54.990 --> 00:04:59.667 which is a summation sign, and then it's O minus E. 94 00:05:01.860 --> 00:05:04.050 And the O here stands for observed, 95 00:05:04.050 --> 00:05:06.210 E here stands for expected, 96 00:05:06.210 --> 00:05:08.370 and then that is being whole squared 97 00:05:08.370 --> 00:05:10.770 and divided by the expected value. 98 00:05:10.770 --> 00:05:13.530 So, now let's again do a couple of these from here 99 00:05:13.530 --> 00:05:15.330 so we understand the process. 100 00:05:15.330 --> 00:05:19.650 So, the observed value for the first cell is 21, 101 00:05:19.650 --> 00:05:22.380 and the expected value is 33.1. 102 00:05:22.380 --> 00:05:25.650 So, here we have 21 minus 33.1 103 00:05:25.650 --> 00:05:28.250 whole squared divided by 33.1. 104 00:05:29.790 --> 00:05:34.470 And again, the next one is going to be 37 minus 41.4 105 00:05:34.470 --> 00:05:38.310 whole squared divided by 41.4 106 00:05:38.310 --> 00:05:42.840 because the 37 is the observed value 107 00:05:42.840 --> 00:05:45.663 and the 41.4 is the expected value. 108 00:05:46.500 --> 00:05:49.200 I think you kind of got the pattern here, 109 00:05:49.200 --> 00:05:52.443 so I'm not going to go through each and every cell. 110 00:05:53.310 --> 00:05:55.500 However, I'm going to go through a little bit 111 00:05:55.500 --> 00:05:57.150 with this calculation so, again, 112 00:05:57.150 --> 00:05:59.460 we can understand algebraically 113 00:05:59.460 --> 00:06:01.350 how we are getting these values. 114 00:06:01.350 --> 00:06:06.000 So, we are going to take 21 and we are going to deduct 33.1, 115 00:06:06.000 --> 00:06:09.030 and what we get is negative 12.1. 116 00:06:09.030 --> 00:06:12.000 But as you know, when we multiply a negative number 117 00:06:12.000 --> 00:06:14.670 by a negative number it becomes positive. 118 00:06:14.670 --> 00:06:19.670 So, when we multiply 12.1 by 12.1 we get positive 146.41, 119 00:06:22.050 --> 00:06:27.050 and then we divide that by 33.1 and we get positive 4.42. 120 00:06:30.750 --> 00:06:34.860 The next one I'm going to also do for you, 37 minus 41.4, 121 00:06:36.870 --> 00:06:39.270 that gives us a negative 4.4. 122 00:06:39.270 --> 00:06:43.950 And when we multiply that by another negative 4.4 123 00:06:43.950 --> 00:06:47.100 we get a positive 19.36, 124 00:06:47.100 --> 00:06:50.670 and we divide positive 19.36 by 41.4. 125 00:06:52.050 --> 00:06:55.693 And what we get is 0.468 126 00:06:58.980 --> 00:07:00.600 because we have rounded it up. 127 00:07:00.600 --> 00:07:03.900 So, similarly, you can go through this process 128 00:07:03.900 --> 00:07:06.120 and get all the values that we need here 129 00:07:06.120 --> 00:07:08.670 to calculate our chi-square value. 130 00:07:08.670 --> 00:07:10.980 And after you calculate all these values 131 00:07:10.980 --> 00:07:13.140 the last step would be to add them up, 132 00:07:13.140 --> 00:07:16.020 and that gives us 19.7. 133 00:07:16.020 --> 00:07:19.170 So, now we are at the last step, which is the conclusion, 134 00:07:19.170 --> 00:07:21.120 and here we will reject the null 135 00:07:21.120 --> 00:07:25.470 because 19.7 is greater than 5.99. 136 00:07:25.470 --> 00:07:27.540 So, now we can conclude that we have 137 00:07:27.540 --> 00:07:30.990 statistically significant evidence at the alpha level 138 00:07:30.990 --> 00:07:35.820 of 0.05 to show that our null is false, 139 00:07:35.820 --> 00:07:40.350 or that marital status and alcoholism are not independent. 140 00:07:40.350 --> 00:07:43.440 Now, if we wanna go further and look at the p-value, 141 00:07:43.440 --> 00:07:47.010 as we learned previously, we can go to the back of our book, 142 00:07:47.010 --> 00:07:51.493 and as you can see, if you go to page number 349 143 00:07:55.980 --> 00:08:00.750 and you go to the second row here, degrees of freedom two. 144 00:08:00.750 --> 00:08:02.220 And if you go down that row 145 00:08:02.220 --> 00:08:05.343 you will see the 5.99 value that we used. 146 00:08:07.230 --> 00:08:09.540 And then if you move further you will see 147 00:08:09.540 --> 00:08:14.160 that the next value is 7.38, the next one is 9.21, 148 00:08:14.160 --> 00:08:17.460 and the last value here is 10.60. 149 00:08:17.460 --> 00:08:22.110 So, basically 19.7 is not provided in this table, 150 00:08:22.110 --> 00:08:26.910 but if it were provided it would be to the right of 10.60 151 00:08:26.910 --> 00:08:30.510 because the values here are incrementally increasing. 152 00:08:30.510 --> 00:08:32.730 Similarly, when we go to the top 153 00:08:32.730 --> 00:08:34.740 and we take a look at the p-values, 154 00:08:34.740 --> 00:08:37.080 they are incrementally decreasing. 155 00:08:37.080 --> 00:08:42.080 So, the first p-value here is .10, the next one is .05, 156 00:08:42.270 --> 00:08:45.093 the subsequent one is .025, 157 00:08:46.350 --> 00:08:49.800 and the last one provided here is .005. 158 00:08:49.800 --> 00:08:54.800 Therefore, the p-value for 19.7 would be less than .005 159 00:08:57.510 --> 00:09:02.510 because again, if 19.7 were to be inserted in this table 160 00:09:03.990 --> 00:09:07.980 in this row then it would be to the right of 10.6. 161 00:09:07.980 --> 00:09:10.110 And to the right of 10.6 means 162 00:09:10.110 --> 00:09:14.220 the p-value would be smaller than .005. 163 00:09:14.220 --> 00:09:15.840 So, I hope this was helpful, 164 00:09:15.840 --> 00:09:18.390 and please let me know if you have any questions 165 00:09:18.390 --> 00:09:20.430 because I'm here to help. 166 00:09:20.430 --> 00:09:22.650 Again, thank you for your attention, 167 00:09:22.650 --> 00:09:24.400 and I'll see you in the next video.