1 00:00:01,170 --> 00:00:03,425 Hi everyone, and welcome to this lecture 2 00:00:03,425 --> 00:00:07,833 on ways to reduce or avoid confounding. 3 00:00:09,780 --> 00:00:11,971 So just a brief recap. 4 00:00:11,971 --> 00:00:14,550 We learned this in epi one 5 00:00:14,550 --> 00:00:17,970 and then we briefly looked at it in module one. 6 00:00:17,970 --> 00:00:18,803 So to be a confound 7 00:00:18,803 --> 00:00:22,344 where a variable must be associated with the exposure, 8 00:00:22,344 --> 00:00:26,553 associated with the outcome, and not on the causal pathway. 9 00:00:27,870 --> 00:00:31,195 Today we'll talk about ways to reduce or avoid confounding 10 00:00:31,195 --> 00:00:33,164 and the different things that we'll talk about 11 00:00:33,164 --> 00:00:35,220 are discussed below. 12 00:00:35,220 --> 00:00:38,070 So restriction, matching, randomization, 13 00:00:38,070 --> 00:00:41,023 stratification, and multivariate analysis. 14 00:00:41,023 --> 00:00:43,681 And when I go through examples, 15 00:00:43,681 --> 00:00:47,199 I like to try to talk about apples and oranges 16 00:00:47,199 --> 00:00:49,730 or other fruits, something that's completely 17 00:00:49,730 --> 00:00:52,438 kind of out of the medical field 18 00:00:52,438 --> 00:00:55,773 to simplify things as much as I can. 19 00:00:56,627 --> 00:00:58,830 So the first thing that we can do 20 00:00:58,830 --> 00:01:03,830 to decrease the risk of confounding in our study 21 00:01:04,228 --> 00:01:05,694 is called restriction. 22 00:01:05,694 --> 00:01:08,710 So restriction is when we limit the study to individuals 23 00:01:08,710 --> 00:01:13,003 who fall within a specified category of each confounder. 24 00:01:13,003 --> 00:01:15,155 So in our example below, 25 00:01:15,155 --> 00:01:18,196 we're now comparing apples to apples. 26 00:01:18,196 --> 00:01:20,144 The limitation of restriction 27 00:01:20,144 --> 00:01:25,080 is that we reduce the generalizability of our study. 28 00:01:25,080 --> 00:01:29,400 So we are now looking at apples versus apples, 29 00:01:29,400 --> 00:01:30,517 we're not looking at fruits, 30 00:01:30,517 --> 00:01:32,560 and the results from our apples study 31 00:01:32,560 --> 00:01:36,696 can't necessarily be generalized to all fruits. 32 00:01:36,696 --> 00:01:40,394 We can also have difficulties recruiting participants. 33 00:01:40,394 --> 00:01:42,578 And this becomes more of a problem 34 00:01:42,578 --> 00:01:45,157 if we are restricting on several variables. 35 00:01:45,157 --> 00:01:49,381 So for example, if I wanted to look at green apples 36 00:01:49,381 --> 00:01:52,593 that were grown in Vermont in a certain months, 37 00:01:52,593 --> 00:01:57,593 I may have trouble getting enough apples to do a good study. 38 00:01:59,648 --> 00:02:01,998 The next thing we can do to decrease confounding 39 00:02:01,998 --> 00:02:04,916 in our studies is called matching. 40 00:02:04,916 --> 00:02:06,587 And this is where we select subjects 41 00:02:06,587 --> 00:02:08,450 such that the confounders are distributed 42 00:02:08,450 --> 00:02:11,760 in identical manner among groups. 43 00:02:11,760 --> 00:02:15,205 This is really useful in case control studies. 44 00:02:15,205 --> 00:02:18,081 We have to be careful of over matching, 45 00:02:18,081 --> 00:02:20,108 which occurs when cases and controls 46 00:02:20,108 --> 00:02:22,321 are inadvertently matched 47 00:02:22,321 --> 00:02:26,084 on characteristics that is potentially causal. 48 00:02:26,084 --> 00:02:28,095 For example, if you matched 49 00:02:28,095 --> 00:02:32,610 on smoking status when the outcome is lung cancer. 50 00:02:32,610 --> 00:02:37,610 We can't then look at the impact of smoking on lung cancer 51 00:02:37,864 --> 00:02:40,852 because we've matched for that variable. 52 00:02:40,852 --> 00:02:43,231 There are two types of matching. 53 00:02:43,231 --> 00:02:45,244 We have individual matching, 54 00:02:45,244 --> 00:02:47,725 which is for every case you match one 55 00:02:47,725 --> 00:02:50,152 or more controls with the same value. 56 00:02:50,152 --> 00:02:53,321 So here we would match every apple to every apple 57 00:02:53,321 --> 00:02:56,308 or every green apple to every green apple, 58 00:02:56,308 --> 00:02:58,661 or every green apple grown organically 59 00:02:58,661 --> 00:02:59,965 in the state of Vermont 60 00:02:59,965 --> 00:03:01,911 to every green apple grown organically 61 00:03:01,911 --> 00:03:03,492 in the state of Vermont. 62 00:03:03,492 --> 00:03:05,930 And then we have frequency matching, 63 00:03:05,930 --> 00:03:07,891 which is where you create a control group 64 00:03:07,891 --> 00:03:10,938 with similar means and proportions of confounding variables 65 00:03:10,938 --> 00:03:12,792 as the case group. 66 00:03:12,792 --> 00:03:15,924 So here we would get a group of apples and oranges 67 00:03:15,924 --> 00:03:20,410 and then match that group characteristics to a group 68 00:03:22,088 --> 00:03:24,693 which will be our control group. 69 00:03:26,310 --> 00:03:27,959 The strengths and limitations of matching 70 00:03:27,959 --> 00:03:29,630 are shown on the screen. 71 00:03:29,630 --> 00:03:33,202 So matching is intuitive and easy to explain. 72 00:03:33,202 --> 00:03:35,735 It guarantees a certain degree of comparability 73 00:03:35,735 --> 00:03:39,522 even in small studies, and it's efficient, 74 00:03:39,522 --> 00:03:43,276 especially if you're matching on a strong confounder. 75 00:03:43,276 --> 00:03:45,888 The limitations is that it's costly. 76 00:03:45,888 --> 00:03:49,922 If it's logistically complicated, it can be an efficient, 77 00:03:49,922 --> 00:03:52,281 if you're matching on a weak confounder, 78 00:03:52,281 --> 00:03:55,033 you can't study the matching variable. 79 00:03:55,033 --> 00:03:57,120 And the more things we match on, 80 00:03:57,120 --> 00:03:59,026 the harder it is to find a match. 81 00:03:59,026 --> 00:04:03,003 And you also have the possibility of residual confounding. 82 00:04:04,710 --> 00:04:07,328 So here's an example of a study that I've actually done 83 00:04:07,328 --> 00:04:10,110 where we used restricting and matching 84 00:04:10,110 --> 00:04:14,040 to decrease the impact of confounding on our results. 85 00:04:14,040 --> 00:04:15,761 So here we looked at patient reported outcomes 86 00:04:15,761 --> 00:04:18,763 at six to 12 months among survivors 87 00:04:18,763 --> 00:04:21,316 of firearm injuries in the US. 88 00:04:21,316 --> 00:04:26,316 So we wanted to look at traumatic injuries as our exposure 89 00:04:26,459 --> 00:04:29,758 and patient reported outcomes as our outcome. 90 00:04:29,758 --> 00:04:34,758 And then we thought that the green variables 91 00:04:34,800 --> 00:04:35,995 listed on the screen 92 00:04:35,995 --> 00:04:39,533 could be potential confounding variables. 93 00:04:39,533 --> 00:04:43,953 And so we restricted the study to some categories 94 00:04:43,953 --> 00:04:48,081 and then we matched the study on other categories. 95 00:04:48,081 --> 00:04:50,861 So we restricted our study to adults 96 00:04:50,861 --> 00:04:54,533 who were between the age of 18 to 64 years. 97 00:04:56,818 --> 00:05:01,506 We looked only at patients admitted in 2015 to 2018, 98 00:05:01,506 --> 00:05:04,971 and we restricted our study to moderate to severe injuries 99 00:05:04,971 --> 00:05:08,591 from firearm injuries or motor vehicle crashes. 100 00:05:08,591 --> 00:05:12,199 And then we matched the firearm injury patients 101 00:05:12,199 --> 00:05:15,850 to motor vehicle crash injury patients 102 00:05:15,850 --> 00:05:19,997 on age, gender, race, and education level, 103 00:05:19,997 --> 00:05:24,997 and on a history of previous psychiatric illness diagnosis. 104 00:05:26,223 --> 00:05:28,444 And then we found that firearm injury survivors 105 00:05:28,444 --> 00:05:29,497 were more likely to have 106 00:05:29,497 --> 00:05:32,852 daily pain screened positive for PTSD 107 00:05:32,852 --> 00:05:34,734 and had worse physical and mental health 108 00:05:34,734 --> 00:05:35,914 related quality of life 109 00:05:35,914 --> 00:05:40,503 compared to similarly injured motor vehicle crash survivors. 110 00:05:41,432 --> 00:05:42,979 So the next thing that we can do 111 00:05:42,979 --> 00:05:45,842 to decrease the impact of confounding on our studies 112 00:05:45,842 --> 00:05:47,811 is called randomization, 113 00:05:47,811 --> 00:05:50,083 which is where you randomly allocate individuals 114 00:05:50,083 --> 00:05:51,046 to study groups. 115 00:05:51,046 --> 00:05:54,409 The strengths is that the groups are likely to be balanced 116 00:05:54,409 --> 00:05:56,410 in terms of identified confounders 117 00:05:56,410 --> 00:05:59,907 and potential confounders that weren't identified. 118 00:05:59,907 --> 00:06:01,206 So in our example below 119 00:06:01,206 --> 00:06:05,875 we're comparing a random allocation of apple and oranges 120 00:06:05,875 --> 00:06:09,123 to a random allocation of apples and oranges. 121 00:06:10,380 --> 00:06:13,152 And potential unknown confounder 122 00:06:13,152 --> 00:06:16,621 that may impact our study results could be worms, 123 00:06:16,621 --> 00:06:18,807 and we don't know which fruit have worms 124 00:06:18,807 --> 00:06:21,873 because we can't see which ones have worms. 125 00:06:21,873 --> 00:06:24,793 But because we've randomly allocated the fruit, 126 00:06:24,793 --> 00:06:28,331 we've also randomly allocated the worms. 127 00:06:28,331 --> 00:06:32,068 The limitations are that you can have unlucky randomization, 128 00:06:32,068 --> 00:06:34,380 which can cause an imbalance, 129 00:06:34,380 --> 00:06:37,101 and this is more likely with small sample sizes. 130 00:06:37,101 --> 00:06:42,033 And randomization isn't always feasible or ethical. 131 00:06:43,083 --> 00:06:45,765 The next way that we can reduce 132 00:06:45,765 --> 00:06:49,068 the potential impacts of confounding on our study 133 00:06:49,068 --> 00:06:51,180 is called stratification. 134 00:06:51,180 --> 00:06:52,169 And this is when 135 00:06:52,169 --> 00:06:54,613 the study population is broken into subgroups 136 00:06:54,613 --> 00:06:58,035 according to the level of potential confounding factors, 137 00:06:58,035 --> 00:07:03,035 and you actually produce results for each level. 138 00:07:03,234 --> 00:07:06,420 So the strength is that you have more granular data, 139 00:07:06,420 --> 00:07:09,728 and the limitations are that you have smaller sample sizes 140 00:07:09,728 --> 00:07:11,548 within each subdivision. 141 00:07:11,548 --> 00:07:14,490 So here we would produce results 142 00:07:14,490 --> 00:07:17,005 for our oranges versus oranges analysis, 143 00:07:17,005 --> 00:07:20,130 and then we would also produce results 144 00:07:20,130 --> 00:07:22,519 for our apple versus apple analysis. 145 00:07:22,519 --> 00:07:24,070 Now, this becomes more difficult 146 00:07:24,070 --> 00:07:26,894 if we want to do an analysis on 147 00:07:26,894 --> 00:07:31,016 apples that are organic versus apples that are not organic. 148 00:07:31,016 --> 00:07:32,463 So we're doing, 149 00:07:34,921 --> 00:07:37,826 it becomes more complicated 150 00:07:37,826 --> 00:07:42,826 if you stratify it down to too many levels. 151 00:07:44,700 --> 00:07:46,179 So an example of a study that I've done, 152 00:07:46,179 --> 00:07:49,439 where we looked at, 153 00:07:49,439 --> 00:07:51,590 where we used stratification 154 00:07:51,590 --> 00:07:54,783 is this study where we looked at the impact of 155 00:07:54,783 --> 00:07:57,598 income on emergency general surgery outcomes 156 00:07:57,598 --> 00:08:00,693 in urban and rural areas. 157 00:08:01,560 --> 00:08:05,498 So we wanted to have income as our exposure 158 00:08:05,498 --> 00:08:08,164 and emergency general surgery adverse outcomes 159 00:08:08,164 --> 00:08:10,127 as our outcome. 160 00:08:10,127 --> 00:08:12,395 And then rurality was the variable 161 00:08:12,395 --> 00:08:14,791 that we thought would be the confounding factor 162 00:08:14,791 --> 00:08:17,215 that we wanted to stratify by. 163 00:08:17,215 --> 00:08:20,470 So low income emergency general surgery patients 164 00:08:20,470 --> 00:08:23,044 have higher rates of postoperative adverse events 165 00:08:23,044 --> 00:08:25,350 compared to high income patients. 166 00:08:25,350 --> 00:08:28,421 And this may be related to healthcare segregation. 167 00:08:28,421 --> 00:08:32,136 The emergent nature of emergency general surgery conditions 168 00:08:32,136 --> 00:08:36,291 and limited number of emergency general surgery providers 169 00:08:36,291 --> 00:08:40,140 in rural areas may result in less healthcare segregation 170 00:08:40,140 --> 00:08:42,133 and therefore less variability 171 00:08:42,133 --> 00:08:46,383 in emergency general surgery outcomes in rural areas. 172 00:08:48,771 --> 00:08:51,616 And so to do this analysis, 173 00:08:51,616 --> 00:08:55,889 we actually stratified our analysis 174 00:08:55,889 --> 00:08:58,732 into rural groups and urban groups, 175 00:08:58,732 --> 00:09:00,874 and we found that income was associated 176 00:09:00,874 --> 00:09:03,588 with higher postoperative adverse events in urban, 177 00:09:03,588 --> 00:09:05,751 but not in rural settings. 178 00:09:05,751 --> 00:09:07,661 So the socioeconomic disparity 179 00:09:07,661 --> 00:09:10,596 in emergency general surgery outcomes in urban settings 180 00:09:10,596 --> 00:09:13,144 may reflect healthcare segregation, 181 00:09:13,144 --> 00:09:14,435 which is a differential access 182 00:09:14,435 --> 00:09:17,643 to high quality healthcare for low income patients. 183 00:09:19,727 --> 00:09:22,330 The last thing that we'll touch on today 184 00:09:22,330 --> 00:09:26,683 is multi-variable or multivariate regression analysis. 185 00:09:26,683 --> 00:09:29,650 These are advanced statistical methods 186 00:09:29,650 --> 00:09:32,100 that allow us to remove the effects 187 00:09:32,100 --> 00:09:35,949 of specified confounding variables. 188 00:09:35,949 --> 00:09:40,949 And we'll really touch on these analysis techniques 189 00:09:42,196 --> 00:09:45,399 later on in epi two. 190 00:09:45,399 --> 00:09:47,278 But basically they allow us to see 191 00:09:47,278 --> 00:09:48,870 if there's a difference after removing 192 00:09:48,870 --> 00:09:52,572 the effects of specified confounding variables. 193 00:09:52,572 --> 00:09:54,450 The strengths are that it's inexpensive 194 00:09:54,450 --> 00:09:58,897 and that it can handle multiple confounders simultaneously. 195 00:09:58,897 --> 00:10:02,379 So this is a study where we looked, 196 00:10:02,379 --> 00:10:04,765 we did a multivariate analysis. 197 00:10:04,765 --> 00:10:07,756 We looked at the lethality of active shooter incidents 198 00:10:07,756 --> 00:10:11,460 with and without semi-automatic rifles in the US. 199 00:10:11,460 --> 00:10:14,587 And so our exposure was the presence of 200 00:10:14,587 --> 00:10:18,510 semi-automatic rifles at an active shooter event. 201 00:10:18,510 --> 00:10:23,142 And our outcome was the number of persons wounded or killed. 202 00:10:23,142 --> 00:10:25,184 The variables in green 203 00:10:25,184 --> 00:10:27,532 were the potential confounding variables 204 00:10:27,532 --> 00:10:29,403 that we wanted to adjust for. 205 00:10:29,403 --> 00:10:32,460 So the place of the shooting, the year of the shooting, 206 00:10:32,460 --> 00:10:34,860 and the presence of other firearms. 207 00:10:34,860 --> 00:10:38,006 And then we found after doing our analysis 208 00:10:38,006 --> 00:10:40,250 that more people were wounded or killed 209 00:10:40,250 --> 00:10:43,024 in incidents in which semi-automatic rifles were used 210 00:10:43,024 --> 00:10:46,143 compared to incidents involving other firearms. 211 00:10:47,910 --> 00:10:50,398 So today we've briefly discussed 212 00:10:50,398 --> 00:10:54,719 on the ways to reduce or avoid confounding 213 00:10:54,719 --> 00:10:57,199 when we are designing a study, 214 00:10:57,199 --> 00:11:02,155 restriction, matching, randomization, stratification, 215 00:11:02,155 --> 00:11:05,341 and multivariate analysis. 216 00:11:05,341 --> 00:11:07,740 So that concludes this lecture. 217 00:11:07,740 --> 00:11:08,643 Take care.