WEBVTT 1 00:00:01.950 --> 00:00:03.450 Welcome to the video lecture 2 00:00:03.450 --> 00:00:06.600 on qualitative data analysis 3 00:00:06.600 --> 00:00:09.690 where we will learn, once we've collected data, 4 00:00:09.690 --> 00:00:14.670 how do we analyze it, how do we make sense of it 5 00:00:14.670 --> 00:00:17.703 to turn the raw data into results? 6 00:00:21.150 --> 00:00:24.600 So we're gonna do a quick introduction 7 00:00:24.600 --> 00:00:27.690 to talk about the basic steps of the analysis 8 00:00:27.690 --> 00:00:31.410 and the specific method called coding, 9 00:00:31.410 --> 00:00:35.340 and then revisit the idea of how can the researcher 10 00:00:35.340 --> 00:00:39.540 demonstrate rigor in the process 11 00:00:39.540 --> 00:00:42.510 to make sure that the results are credible 12 00:00:42.510 --> 00:00:43.923 to the readers. 13 00:00:46.860 --> 00:00:51.430 So basically, the definition is to 14 00:00:53.280 --> 00:00:57.930 discover underlying meaning in non-numerical data. 15 00:00:57.930 --> 00:01:00.480 Note that this tends to be more art than science, 16 00:01:00.480 --> 00:01:04.230 that there's many right ways of doing it. 17 00:01:04.230 --> 00:01:07.140 It would be unusual that two researchers 18 00:01:07.140 --> 00:01:10.380 or more would do it in exactly the same way, 19 00:01:10.380 --> 00:01:12.690 but if done well, 20 00:01:12.690 --> 00:01:17.460 it can nonetheless be credible and useful. 21 00:01:17.460 --> 00:01:20.730 And note that it's also highly iterative, 22 00:01:20.730 --> 00:01:24.870 that you're always going back and reassessing 23 00:01:24.870 --> 00:01:28.890 and looking at it through new lenses 24 00:01:28.890 --> 00:01:30.393 as they emerge. 25 00:01:33.060 --> 00:01:35.610 So here are the basic five steps, 26 00:01:35.610 --> 00:01:37.503 and we'll walk through each one. 27 00:01:40.350 --> 00:01:43.320 So first, you want to document the data. 28 00:01:43.320 --> 00:01:48.090 So this involves, you go and you do whatever it is, 29 00:01:48.090 --> 00:01:50.670 the interview or the observation. 30 00:01:50.670 --> 00:01:55.020 You often will record it and transcribe it. 31 00:01:55.020 --> 00:01:58.980 And you also wanna take field notes, and you wanna go 32 00:01:58.980 --> 00:02:02.793 and review and reorganize these field groups. 33 00:02:03.810 --> 00:02:08.100 And by transcribing, you have a word for word account 34 00:02:08.100 --> 00:02:10.563 of what the interview subject said. 35 00:02:13.830 --> 00:02:18.830 Next, you wanna start to organize them into concepts. 36 00:02:19.110 --> 00:02:23.970 And the best way to do this is just to spend some time 37 00:02:23.970 --> 00:02:28.770 with your data, reread your transcripts, reread your notes, 38 00:02:28.770 --> 00:02:33.770 make notes to yourself on themes and ideas 39 00:02:33.900 --> 00:02:36.870 and connection that you notice. 40 00:02:36.870 --> 00:02:40.170 And this can be done both one question at a time. 41 00:02:40.170 --> 00:02:44.400 So you look at all of the responses 42 00:02:44.400 --> 00:02:48.093 to question one from all of your respondents, 43 00:02:49.533 --> 00:02:53.100 or you look at all of the first respondents, 44 00:02:53.100 --> 00:02:54.690 then the second. 45 00:02:54.690 --> 00:02:59.690 And really, both of these two ways work. 46 00:03:02.790 --> 00:03:07.020 And then you code them, you name the patterns 47 00:03:07.020 --> 00:03:10.350 that you perceive, 48 00:03:10.350 --> 00:03:15.090 and you start to sort of organize these concepts 49 00:03:15.090 --> 00:03:19.620 into these code to sort of give them a meaningful structure. 50 00:03:19.620 --> 00:03:23.940 Some of these codes may be preset based on the questions 51 00:03:23.940 --> 00:03:26.130 that you ask, but many of them will 52 00:03:26.130 --> 00:03:29.460 and should be emergent, you know, sort of themes 53 00:03:29.460 --> 00:03:31.470 that you would not have thought of, 54 00:03:31.470 --> 00:03:33.870 but that emerge from the data. 55 00:03:33.870 --> 00:03:36.570 And in a few minutes, 56 00:03:36.570 --> 00:03:40.443 I'm gonna talk about the specific mechanics of coding. 57 00:03:43.680 --> 00:03:47.880 So when you create these codes, in many cases, 58 00:03:47.880 --> 00:03:52.880 you want to categorize them into sort of classes, 59 00:03:53.340 --> 00:03:57.120 something like present or absent, 60 00:03:57.120 --> 00:03:58.980 something they said or they didn't say, 61 00:03:58.980 --> 00:04:01.710 or high, medium, low, a strong belief, 62 00:04:01.710 --> 00:04:05.250 a medium belief, a low one, 63 00:04:05.250 --> 00:04:09.540 or sort of the intensity of a feeling 64 00:04:09.540 --> 00:04:12.003 or an awareness. 65 00:04:13.440 --> 00:04:15.990 And it's very, again, 66 00:04:15.990 --> 00:04:20.990 no two researchers will maybe do this exactly the same way, 67 00:04:22.500 --> 00:04:25.740 but it's important for you to be clear 68 00:04:25.740 --> 00:04:29.190 of where are the boundaries of these boxes? 69 00:04:29.190 --> 00:04:34.190 How did you sort into high, medium, low, and why? 70 00:04:34.380 --> 00:04:37.860 What are the boundaries that, well, what make you say 71 00:04:37.860 --> 00:04:42.270 that this was high intensity and this one is low? 72 00:04:42.270 --> 00:04:45.093 And make sure that you document it as you go. 73 00:04:47.670 --> 00:04:50.520 Next, you start to look at 74 00:04:50.520 --> 00:04:53.340 connections within the data. 75 00:04:53.340 --> 00:04:56.400 And I'm going to use an analysis 76 00:04:56.400 --> 00:04:59.890 or use, sorry, an analogy to 77 00:05:00.900 --> 00:05:02.940 statistical analysis. 78 00:05:02.940 --> 00:05:07.140 So the first way is looking at these themes. 79 00:05:07.140 --> 00:05:11.850 So you can think about creating a spreadsheet 80 00:05:11.850 --> 00:05:15.780 where the columns are the themes 81 00:05:15.780 --> 00:05:19.050 and the rows are the respondents. 82 00:05:19.050 --> 00:05:23.160 And within that column 83 00:05:23.160 --> 00:05:27.930 and row is a little snippet 84 00:05:27.930 --> 00:05:32.730 of the interview, which fits underneath that theme, 85 00:05:32.730 --> 00:05:36.120 which was spoken by that respondent. 86 00:05:36.120 --> 00:05:38.220 And on your first go round, 87 00:05:38.220 --> 00:05:42.330 you're just sort of describing what are those themes 88 00:05:42.330 --> 00:05:45.150 and how often can they occur? 89 00:05:45.150 --> 00:05:47.550 And then in the second 90 00:05:47.550 --> 00:05:51.360 more analogous to bivaraite analysis, 91 00:05:51.360 --> 00:05:55.590 is there a relationship between how someone, 92 00:05:55.590 --> 00:05:58.140 those who was say, high, medium, low, 93 00:05:58.140 --> 00:06:03.140 or a certain theme to another theme. 94 00:06:06.300 --> 00:06:10.230 So again, you can list like 95 00:06:10.230 --> 00:06:12.060 what were the themes, 96 00:06:12.060 --> 00:06:15.480 how often were they mentioned? 97 00:06:15.480 --> 00:06:19.500 And it's good to select one or two quotations, 98 00:06:19.500 --> 00:06:22.470 which really sort of epitomize 99 00:06:22.470 --> 00:06:24.450 or embody those themes. 100 00:06:24.450 --> 00:06:28.800 And again, this is analogous 101 00:06:28.800 --> 00:06:32.460 to writing out the descriptive statistics, 102 00:06:32.460 --> 00:06:36.633 like a frequency analysis with quantitative data. 103 00:06:38.100 --> 00:06:42.210 And then once you've made this data matrix, 104 00:06:42.210 --> 00:06:46.810 you can see are there relationships between 105 00:06:48.960 --> 00:06:51.030 theme X and theme Y. 106 00:06:51.030 --> 00:06:55.230 Sort of like, and one from the past 107 00:06:55.230 --> 00:07:00.230 is in a previous version of this class 108 00:07:01.290 --> 00:07:05.343 is we saw a relationship between, 109 00:07:07.350 --> 00:07:09.930 between one's class standing 110 00:07:09.930 --> 00:07:14.340 and the intensity of environmental values 111 00:07:14.340 --> 00:07:18.480 and the use of reusable water bottles 112 00:07:18.480 --> 00:07:21.720 that we found that there's 113 00:07:21.720 --> 00:07:24.850 sort of a UVM effect 114 00:07:26.400 --> 00:07:30.810 and this is maybe before environmental values 115 00:07:30.810 --> 00:07:33.510 were as widespread as they are now, 116 00:07:33.510 --> 00:07:36.210 but students who had been here longer 117 00:07:36.210 --> 00:07:38.880 tend to sort of learn about and embrace 118 00:07:38.880 --> 00:07:41.550 and hold stronger environmental values. 119 00:07:41.550 --> 00:07:44.080 And one of the things that they talked about 120 00:07:47.820 --> 00:07:52.380 how they behave based on these values is using a reusable 121 00:07:52.380 --> 00:07:56.440 water bottle in instead of buying a throw away 122 00:07:57.480 --> 00:07:58.823 plastic one each time. 123 00:07:58.823 --> 00:08:02.430 But for any of these themes, 124 00:08:02.430 --> 00:08:06.000 are there recurring patterns that someone who is high 125 00:08:06.000 --> 00:08:09.510 or medium in one will also be 126 00:08:09.510 --> 00:08:12.030 higher, medium or low, et cetera 127 00:08:12.030 --> 00:08:13.953 in the next. 128 00:08:16.170 --> 00:08:19.210 Next, you want to corroborate, you want to sort of 129 00:08:20.640 --> 00:08:25.170 compare the meaning that you have discovered 130 00:08:25.170 --> 00:08:28.440 with other contexts. 131 00:08:28.440 --> 00:08:31.650 So you think about why did they say that? 132 00:08:31.650 --> 00:08:34.590 Are there other data sets or contexts or theories 133 00:08:34.590 --> 00:08:38.100 or results that sort of speak to this? 134 00:08:38.100 --> 00:08:41.880 Are there other reasons why they might be saying it? 135 00:08:41.880 --> 00:08:45.390 And here's a place where you really wanna be reflexive 136 00:08:45.390 --> 00:08:47.790 and make sure to examine 137 00:08:47.790 --> 00:08:51.570 and try to take into account your own biases 138 00:08:51.570 --> 00:08:55.680 and why did you see this the way that you did? 139 00:08:55.680 --> 00:08:58.140 Are there other ways that other researchers 140 00:08:58.140 --> 00:09:00.813 who aren't you might have seen it and why? 141 00:09:03.870 --> 00:09:06.360 Then you want to report out. 142 00:09:06.360 --> 00:09:11.340 So you wanna be very clear about why did you name the codes 143 00:09:11.340 --> 00:09:16.340 the way that you did and those relationships. 144 00:09:16.380 --> 00:09:20.670 And again, that's why sort of memoing to yourself 145 00:09:20.670 --> 00:09:25.530 really helps you to be able to clearly say, 146 00:09:25.530 --> 00:09:29.250 I put this quotation into this 147 00:09:29.250 --> 00:09:31.110 under this theme, 148 00:09:31.110 --> 00:09:36.110 and I saw this relationship between these themes 149 00:09:36.300 --> 00:09:39.693 and explained why you made that decision that you did. 150 00:09:43.590 --> 00:09:48.450 It's a good idea again, I had said how it's good to find 151 00:09:48.450 --> 00:09:51.630 a couple quotations per important theme, 152 00:09:51.630 --> 00:09:54.750 which really speak to it. 153 00:09:54.750 --> 00:09:59.370 And the reason that you do is it 154 00:09:59.370 --> 00:10:02.190 provides specific evidence 155 00:10:02.190 --> 00:10:06.090 and then avoids the, sort of take my word for it, 156 00:10:06.090 --> 00:10:08.820 sort of what's a response 157 00:10:08.820 --> 00:10:13.050 that was commonly said. 158 00:10:13.050 --> 00:10:17.160 Was there a range of what folks said? 159 00:10:17.160 --> 00:10:19.128 And you also wanna explain why did you 160 00:10:19.128 --> 00:10:24.128 choose that quotation? 161 00:10:25.800 --> 00:10:30.030 And finally, you wanna allow for enough of the quotation 162 00:10:30.030 --> 00:10:31.800 to set some context. 163 00:10:31.800 --> 00:10:35.627 But you always want to respect confidentiality, 164 00:10:37.020 --> 00:10:38.760 that you don't wanna give 165 00:10:38.760 --> 00:10:43.760 so much information that the reader can attribute 166 00:10:44.790 --> 00:10:48.870 the quotation to any individual. 167 00:10:48.870 --> 00:10:51.210 Because in the most cases, 168 00:10:51.210 --> 00:10:53.880 when you do the IRB process 169 00:10:53.880 --> 00:10:56.610 and you talk to the subject 170 00:10:56.610 --> 00:11:00.840 about the research process, 171 00:11:00.840 --> 00:11:05.310 part of the agreement that you make 172 00:11:05.310 --> 00:11:10.020 is that you will not at attribute any specific 173 00:11:10.020 --> 00:11:13.250 quotation or answer to any individual. 174 00:11:16.020 --> 00:11:19.680 So you want to be careful that you have enough 175 00:11:19.680 --> 00:11:23.070 to set the context so the reader understands, 176 00:11:23.070 --> 00:11:26.613 but not so much that you're identifying who said it. 177 00:11:29.340 --> 00:11:33.180 Finally, when you report, you wanna talk about shortfalls, 178 00:11:33.180 --> 00:11:36.580 you wanna talk about the lack of generalizability, 179 00:11:39.570 --> 00:11:42.600 any biases of yourself that you might have. 180 00:11:42.600 --> 00:11:47.220 And finally, you wanted to talk about what future questions 181 00:11:47.220 --> 00:11:51.513 and directions do these results suggest. 182 00:11:53.400 --> 00:11:57.903 And here are some more good resources to read up on that. 183 00:12:00.540 --> 00:12:03.990 So now I'm gonna talk about the actual process of coding, 184 00:12:03.990 --> 00:12:08.990 of how do you name and organize bits of data 185 00:12:09.810 --> 00:12:12.543 from your collection process 186 00:12:16.560 --> 00:12:18.570 such as interviews. 187 00:12:18.570 --> 00:12:21.810 So you can think of it as that you're taking 188 00:12:21.810 --> 00:12:25.320 little pieces of paper, so you don't actually do this, 189 00:12:25.320 --> 00:12:29.970 but conceptually you take quotations 190 00:12:29.970 --> 00:12:31.950 and you sort of cut them out, 191 00:12:31.950 --> 00:12:34.854 and then you put them into 192 00:12:34.854 --> 00:12:39.854 little folders organized 193 00:12:40.890 --> 00:12:43.200 by common themes. 194 00:12:43.200 --> 00:12:46.890 And again, this is a very inductive process 195 00:12:46.890 --> 00:12:51.630 where you use the data to come up with these themes. 196 00:12:51.630 --> 00:12:56.160 It isn't that you start with a theory, 197 00:12:56.160 --> 00:13:00.033 you start with the data and therefore it is induction. 198 00:13:02.730 --> 00:13:07.320 So you can think of the themes being like a folder, 199 00:13:07.320 --> 00:13:11.790 and a folder has a name on it, 200 00:13:11.790 --> 00:13:13.800 the name of the code, 201 00:13:13.800 --> 00:13:17.130 and you put sort of these statements 202 00:13:17.130 --> 00:13:19.440 that fall under that theme, 203 00:13:19.440 --> 00:13:24.440 that sort of articulate 204 00:13:25.020 --> 00:13:26.970 a common meaning. 205 00:13:26.970 --> 00:13:31.970 You put it under these, you conceptually put them together 206 00:13:33.240 --> 00:13:35.103 in these folders. 207 00:13:39.540 --> 00:13:42.480 This coding process happens in three steps, 208 00:13:42.480 --> 00:13:45.300 open, axial, and selective. 209 00:13:45.300 --> 00:13:47.430 And one way that I like to conceptualize 210 00:13:47.430 --> 00:13:51.450 is that the open codes are like the file folders. 211 00:13:51.450 --> 00:13:55.620 The axial codes are like the file cabinet drawers. 212 00:13:55.620 --> 00:14:00.150 And the selective codes is the overall file cabinet. 213 00:14:00.150 --> 00:14:03.090 I think you'll see what I mean in a second. 214 00:14:03.090 --> 00:14:06.240 So open codes are these folders, 215 00:14:06.240 --> 00:14:09.360 and that's the first step of what are the emergent themes 216 00:14:09.360 --> 00:14:12.150 that arise in your first read through. 217 00:14:12.150 --> 00:14:15.000 And they are determined 218 00:14:15.000 --> 00:14:19.140 by the researchers examining of the data 219 00:14:19.140 --> 00:14:22.680 where you uncover name and expose the meaning. 220 00:14:22.680 --> 00:14:25.680 And at the end you'll probably have 221 00:14:25.680 --> 00:14:29.700 depending on the length of and number of interviews, 222 00:14:29.700 --> 00:14:30.840 there'll be a lot of them, 223 00:14:30.840 --> 00:14:35.840 maybe let's say 40 for them, where there's a long list. 224 00:14:35.850 --> 00:14:37.950 They're not really sorted 225 00:14:37.950 --> 00:14:41.040 by any frequency or importance. 226 00:14:41.040 --> 00:14:43.770 Just sort of a long list of themes 227 00:14:43.770 --> 00:14:48.753 that you notice in your first few read throughs of the data. 228 00:14:49.950 --> 00:14:52.650 Next is the axial code, 229 00:14:52.650 --> 00:14:56.340 and this is the secondary analysis 230 00:14:56.340 --> 00:15:00.930 where you think about how do your codes fit into groups. 231 00:15:00.930 --> 00:15:05.280 So you sort of code your codes where you pair down 232 00:15:05.280 --> 00:15:07.590 and you sort of put like with like, 233 00:15:07.590 --> 00:15:12.510 and think about and come down to maybe five axial codes 234 00:15:12.510 --> 00:15:16.680 of what are five sort of next level 235 00:15:16.680 --> 00:15:21.680 more comprehensive codes that these say 40 236 00:15:23.220 --> 00:15:25.353 open codes fit into. 237 00:15:26.220 --> 00:15:30.330 And this is like the file cabinet drawer 238 00:15:30.330 --> 00:15:35.330 in which these various folders fit in. 239 00:15:37.080 --> 00:15:39.780 And last is selective coding, 240 00:15:39.780 --> 00:15:43.530 where you try to think one or two 241 00:15:43.530 --> 00:15:45.900 of the most important concepts 242 00:15:45.900 --> 00:15:50.900 that really sort of sum up the meaning of your data. 243 00:15:51.120 --> 00:15:55.893 And this is analogous to the overall file cabinet. 244 00:15:58.290 --> 00:16:00.630 In this way you sort of create a tree 245 00:16:00.630 --> 00:16:03.040 where the open codes are on top 246 00:16:04.050 --> 00:16:08.280 and you see how that they fit into various secondary codes. 247 00:16:08.280 --> 00:16:12.873 And then one overarching code at the bottom. 248 00:16:15.810 --> 00:16:18.540 As I said before, it's really important 249 00:16:18.540 --> 00:16:21.393 as you do the coding, write memos to yourself. 250 00:16:22.620 --> 00:16:25.830 Remind yourself what you did and why. 251 00:16:25.830 --> 00:16:28.140 Why did you call this what you did? 252 00:16:28.140 --> 00:16:30.273 Why did you make the decisions? 253 00:16:31.855 --> 00:16:35.710 Why did you put this quotation under this code? 254 00:16:38.040 --> 00:16:41.370 And if you have any sort of theoretical ties 255 00:16:41.370 --> 00:16:45.393 and such, again, put all those into a memo. 256 00:16:46.860 --> 00:16:50.640 So here again is the evidence of rigor. 257 00:16:50.640 --> 00:16:53.400 And I'm gonna talk to these bottom ones here, 258 00:16:53.400 --> 00:16:57.480 especially where you want your codes to make sense, 259 00:16:57.480 --> 00:17:01.863 to be transparent and you want multiple. 260 00:17:03.090 --> 00:17:06.570 You often wanna use multiple coders 261 00:17:06.570 --> 00:17:09.000 who do the coding independently 262 00:17:09.000 --> 00:17:12.510 and then see if they sort of came up with the same thing. 263 00:17:12.510 --> 00:17:16.890 And if so, that makes your data more reliable. 264 00:17:16.890 --> 00:17:21.240 And when you report out, again, 265 00:17:21.240 --> 00:17:25.800 being transparent about what you did and why, 266 00:17:25.800 --> 00:17:30.000 and that the voice of the respondents 267 00:17:30.000 --> 00:17:33.150 should come through, and in many cases, 268 00:17:33.150 --> 00:17:36.960 due to really good quotations that you choose, 269 00:17:36.960 --> 00:17:39.063 which sum up these codes. 270 00:17:42.000 --> 00:17:44.400 Here are five more ideas. 271 00:17:44.400 --> 00:17:49.400 This is a really good article here 272 00:17:49.770 --> 00:17:54.770 about how can you show and what should you think about 273 00:17:56.460 --> 00:17:58.950 for the trustworthiness of your data? 274 00:17:58.950 --> 00:18:02.040 How do you show the analysis that you did 275 00:18:02.040 --> 00:18:05.370 and the way you did about what about it 276 00:18:05.370 --> 00:18:08.910 that the reader can trust that it was done well 277 00:18:08.910 --> 00:18:11.553 and can sort of believe what you said. 278 00:18:14.460 --> 00:18:18.900 As with most other social science research, 279 00:18:18.900 --> 00:18:21.093 you want to think of ethical issues. 280 00:18:21.960 --> 00:18:24.990 Did you only find what you thought that you would? 281 00:18:24.990 --> 00:18:27.370 Did your confirmation bias 282 00:18:29.160 --> 00:18:31.950 overwhelm all other things? 283 00:18:31.950 --> 00:18:36.540 Are you clear about, were you reflexive 284 00:18:36.540 --> 00:18:38.670 and say, well, I saw this but I could, you know, 285 00:18:38.670 --> 00:18:41.400 but that's because of the lens I bring 286 00:18:41.400 --> 00:18:44.683 and here are other ways that it may be interpreted. 287 00:18:48.240 --> 00:18:52.870 And then again, you don't want to attribute 288 00:18:53.790 --> 00:18:56.730 any quotation to any individual, 289 00:18:56.730 --> 00:18:59.430 but you may wish to just sort of track 290 00:18:59.430 --> 00:19:03.420 how an individual answers a number of questions. 291 00:19:03.420 --> 00:19:08.010 So in this case, something like a pseudonym would, you know, 292 00:19:08.010 --> 00:19:12.390 call 'em a name, make up a name, 293 00:19:12.390 --> 00:19:15.540 or even a number or A, B, C, D. 294 00:19:15.540 --> 00:19:18.420 But that allows you to sort of say, 295 00:19:18.420 --> 00:19:23.420 show how respondent A talked about this and this and this 296 00:19:23.640 --> 00:19:25.683 and that may be an important thread. 297 00:19:27.210 --> 00:19:30.720 And here again, are the key takeaways. 298 00:19:30.720 --> 00:19:32.553 Thank you for watching.