WEBVTT 1 00:00:00.750 --> 00:00:02.040 Hello students. 2 00:00:02.040 --> 00:00:05.610 Welcome to Biostat ER example 7, chapter 7. 3 00:00:05.610 --> 00:00:07.950 In this example, we will learn how to perform 4 00:00:07.950 --> 00:00:10.860 hypothesis testing for categorical outcome 5 00:00:10.860 --> 00:00:12.543 when we have two or more samples. 6 00:00:14.730 --> 00:00:17.340 For this example, we are going to use problem 22 7 00:00:17.340 --> 00:00:20.760 from our textbook, which states 8 00:00:20.760 --> 00:00:23.370 use the data shown in problem 21 9 00:00:23.370 --> 00:00:26.880 and test if there is an association between mother's BMI 10 00:00:26.880 --> 00:00:28.920 and child's obesity status. 11 00:00:28.920 --> 00:00:32.130 That is, normal versus overweight/obese. 12 00:00:32.130 --> 00:00:35.370 Run the test at 5% level of significance. 13 00:00:35.370 --> 00:00:37.990 Here I have copied and pasted the information 14 00:00:39.960 --> 00:00:42.090 and I have highlighted the values 15 00:00:42.090 --> 00:00:43.690 that we are going to be using 16 00:00:45.030 --> 00:00:46.083 in this example. 17 00:00:47.040 --> 00:00:50.010 So the first step would be to set up our hypotheses 18 00:00:50.010 --> 00:00:52.410 and determine the level of significance. 19 00:00:52.410 --> 00:00:54.300 So the null states 20 00:00:54.300 --> 00:00:59.220 mother's BMI, and child's obesity status are independent 21 00:00:59.220 --> 00:01:02.430 and the alternative states that the null is false. 22 00:01:02.430 --> 00:01:06.510 The alpha here is 0.05, which is provided to us. 23 00:01:06.510 --> 00:01:08.190 In step 2, we are going to select 24 00:01:08.190 --> 00:01:11.460 the appropriate test statistics, which is chi square, 25 00:01:11.460 --> 00:01:14.040 and the formula is copied and pasted here. 26 00:01:14.040 --> 00:01:17.310 The condition for appropriate use of this test statistic 27 00:01:17.310 --> 00:01:21.120 is that each expected frequency is at least five. 28 00:01:21.120 --> 00:01:24.240 And in step 4 we will compute the expected frequencies 29 00:01:24.240 --> 00:01:27.063 and we will ensure that the condition is met. 30 00:01:28.440 --> 00:01:31.530 In step 3, we are going to set up the decision. 31 00:01:31.530 --> 00:01:35.760 So the degrees of freedom here is r minus 1 32 00:01:35.760 --> 00:01:38.370 multiplied by c minus 1. 33 00:01:38.370 --> 00:01:43.140 So r here is our row, and c here is our columns. 34 00:01:43.140 --> 00:01:45.420 So we have three rows here. 35 00:01:45.420 --> 00:01:49.020 As you can see, this is the mother's BMI 36 00:01:49.020 --> 00:01:53.610 categorized into three levels. 37 00:01:53.610 --> 00:01:58.050 So we have three rows and we have two columns. 38 00:01:58.050 --> 00:02:00.000 So it is going to be 3 minus 1 39 00:02:00.000 --> 00:02:02.430 multiplied by 2 minus 1. 40 00:02:02.430 --> 00:02:05.820 So our degrees of freedom will be two. 41 00:02:05.820 --> 00:02:07.560 And when we go to the back of our book 42 00:02:07.560 --> 00:02:11.490 with two degrees of freedom with an alpha of 0.05, 43 00:02:11.490 --> 00:02:15.180 our decision will be to reject 44 00:02:15.180 --> 00:02:17.790 if our calculated chi square value 45 00:02:17.790 --> 00:02:20.583 is greater than or equal to 5.99. 46 00:02:22.860 --> 00:02:26.760 Now in step 4, we are going to compute the test statistics. 47 00:02:26.760 --> 00:02:29.220 Using the formula, and the formula here is 48 00:02:29.220 --> 00:02:31.980 expected frequency equal to row total 49 00:02:31.980 --> 00:02:36.840 multiplied by column total divided by the big N, 50 00:02:36.840 --> 00:02:39.903 which is the sample size total. 51 00:02:41.880 --> 00:02:44.850 So the top number in each cell of the table 52 00:02:44.850 --> 00:02:46.290 is the observed frequency, 53 00:02:46.290 --> 00:02:48.810 and the bottom number is the expected frequency, 54 00:02:48.810 --> 00:02:50.910 which is shown in parentheses. 55 00:02:50.910 --> 00:02:53.613 So I'm going to do a couple of calculations here. 56 00:02:54.450 --> 00:02:59.280 So the first cell here, I'm going to do that calculation. 57 00:02:59.280 --> 00:03:02.910 We are going to take the row total, which is 56, 58 00:03:02.910 --> 00:03:05.700 and we are going to multiply it by the column total, 59 00:03:05.700 --> 00:03:07.110 which is 62, 60 00:03:07.110 --> 00:03:09.600 and we are going to divide it by 100, 61 00:03:09.600 --> 00:03:11.280 which is the big N here, 62 00:03:11.280 --> 00:03:15.180 and we will get the value 34.7. 63 00:03:15.180 --> 00:03:17.100 And that is inserted here. 64 00:03:17.100 --> 00:03:18.480 I'm going to do another one, 65 00:03:18.480 --> 00:03:20.310 which is basically the next one, 66 00:03:20.310 --> 00:03:24.320 which is again going to be multiplying the row total 56 67 00:03:25.440 --> 00:03:28.080 by the column total, which is 38, 68 00:03:28.080 --> 00:03:31.050 and then dividing it by 100 69 00:03:31.050 --> 00:03:34.230 and we get 21.3. 70 00:03:34.230 --> 00:03:36.210 So I will strongly encourage you 71 00:03:36.210 --> 00:03:39.420 to go through these calculations yourself 72 00:03:39.420 --> 00:03:41.340 and make sure that you are getting the values 73 00:03:41.340 --> 00:03:43.023 that are provided here. 74 00:03:44.850 --> 00:03:49.470 So once we have computed all the expected frequencies 75 00:03:49.470 --> 00:03:52.650 and we can see that our condition is met, 76 00:03:52.650 --> 00:03:56.040 we can proceed with calculating the chi square value. 77 00:03:56.040 --> 00:03:58.050 And for that we are going to use the formula 78 00:03:58.050 --> 00:04:02.580 that was provided here in step 2, 79 00:04:02.580 --> 00:04:07.000 which is observed minus expected 80 00:04:07.980 --> 00:04:10.620 whole squared divided by expected. 81 00:04:10.620 --> 00:04:14.640 So for that, the first one will be observed value is 40, 82 00:04:14.640 --> 00:04:17.280 expected value is 34.7. 83 00:04:17.280 --> 00:04:21.180 So we are going to do the whole square of that 84 00:04:21.180 --> 00:04:22.740 after we do the subtraction. 85 00:04:22.740 --> 00:04:26.250 And then we are going to divide it by 34.7. 86 00:04:26.250 --> 00:04:28.200 Similarly for the next one is going to be 87 00:04:28.200 --> 00:04:31.260 16 minus 21.3 whole squared 88 00:04:31.260 --> 00:04:33.480 divided by 21.3. 89 00:04:33.480 --> 00:04:36.377 And as you can see, we are following the same formula 90 00:04:38.670 --> 00:04:39.843 for each of these. 91 00:04:40.860 --> 00:04:42.360 And now what I'm going to do 92 00:04:42.360 --> 00:04:44.790 is help you understand 93 00:04:44.790 --> 00:04:47.640 how this is performed algebraically 94 00:04:47.640 --> 00:04:49.350 by doing one of the calculations. 95 00:04:49.350 --> 00:04:54.030 So when we take the first one and we take 40 96 00:04:54.030 --> 00:04:58.683 and we subtract 34.7, 97 00:05:03.960 --> 00:05:05.370 we get 5.3. 98 00:05:05.370 --> 00:05:10.370 So when we multiply 5.3 by itself, we get 28.09. 99 00:05:10.530 --> 00:05:13.740 And when we divide that by 34.7, 100 00:05:13.740 --> 00:05:16.563 we get 0.81. 101 00:05:18.810 --> 00:05:21.900 And that is exactly what is inserted here. 102 00:05:21.900 --> 00:05:25.320 So for the subsequent ones, 103 00:05:25.320 --> 00:05:28.080 if you follow the same process algebraically, 104 00:05:28.080 --> 00:05:31.470 you will get the values that are provided here. 105 00:05:31.470 --> 00:05:33.750 Again, I strongly encourage you to make sure 106 00:05:33.750 --> 00:05:35.730 that you are getting the same values 107 00:05:35.730 --> 00:05:37.803 by doing the calculation yourself. 108 00:05:38.910 --> 00:05:40.410 And once that is complete, 109 00:05:40.410 --> 00:05:42.330 we are going to add up all these values 110 00:05:42.330 --> 00:05:43.680 and that will provide us 111 00:05:43.680 --> 00:05:48.033 with our final calculated chi square value, which is 4.95. 112 00:05:48.870 --> 00:05:50.880 So now we are at step 5 113 00:05:50.880 --> 00:05:52.980 and we are going to draw a conclusion. 114 00:05:52.980 --> 00:05:55.770 And our conclusion here is we fail to reject null, 115 00:05:55.770 --> 00:06:00.393 because 4.95 is less than 5.99. 116 00:06:01.410 --> 00:06:05.190 So we do not have statistically significant evidence 117 00:06:05.190 --> 00:06:08.880 at a alpha of 0.05 to show that the null is false 118 00:06:08.880 --> 00:06:13.080 or that the mother's BMI and child's obesity status 119 00:06:13.080 --> 00:06:14.433 are not independent. 120 00:06:18.600 --> 00:06:20.130 Thank you for your time and attention 121 00:06:20.130 --> 00:06:22.500 and please feel free to reach out if you have any questions 122 00:06:22.500 --> 00:06:24.513 and I'll see you in the next video.