WEBVTT 1 00:00:02.040 --> 00:00:05.973 Hello and welcome to the video lecture on sampling. 2 00:00:07.380 --> 00:00:11.070 In this lecture we will learn about 3 00:00:11.070 --> 00:00:14.850 how do you choose the people or the subjects 4 00:00:14.850 --> 00:00:17.520 from which you are going to collect data. 5 00:00:17.520 --> 00:00:19.323 What are the considerations? 6 00:00:20.970 --> 00:00:23.610 So we're gonna talk about the two main types, 7 00:00:23.610 --> 00:00:27.900 non-probability and probability, 8 00:00:27.900 --> 00:00:31.650 and then talk about some sort of vocabulary 9 00:00:31.650 --> 00:00:33.420 that goes along with that. 10 00:00:33.420 --> 00:00:37.140 Talking about parameter and statistics 11 00:00:37.140 --> 00:00:40.353 and then confidence levels and intervals. 12 00:00:43.380 --> 00:00:47.070 So again, the idea of sampling here 13 00:00:47.070 --> 00:00:49.980 is from whom do we collect data? 14 00:00:49.980 --> 00:00:52.650 How do we choose what's our methods 15 00:00:52.650 --> 00:00:57.650 for choosing the right subjects for the research question 16 00:01:00.750 --> 00:01:03.333 or the research problem that we're dealing with? 17 00:01:06.360 --> 00:01:10.620 So as a motivating example, I wanna make it clear 18 00:01:10.620 --> 00:01:13.170 that depending on who you ask, 19 00:01:13.170 --> 00:01:16.410 you're gonna get very different questions 20 00:01:16.410 --> 00:01:19.830 that if you asked who's your favorite musical artist, 21 00:01:19.830 --> 00:01:21.570 who's your favorite actor? 22 00:01:21.570 --> 00:01:25.350 And you would ask my kids, if you had asked my daughter, 23 00:01:25.350 --> 00:01:30.350 she would say, Taylor Swift. 24 00:01:31.110 --> 00:01:34.920 If you asked me, I'd probably say David Bowie. 25 00:01:34.920 --> 00:01:37.920 If you ask folks older than me, 26 00:01:37.920 --> 00:01:39.420 they would have other answers. 27 00:01:41.880 --> 00:01:46.880 Similarly, if you ask what's the most important policy issue 28 00:01:50.940 --> 00:01:55.173 that would change from say, a red state to a blue state, 29 00:01:56.416 --> 00:01:59.250 and then people from the US versus China 30 00:01:59.250 --> 00:02:03.303 versus Iran versus other nations. 31 00:02:06.170 --> 00:02:07.830 So depending on who you ask, 32 00:02:07.830 --> 00:02:11.253 you're gonna get a different response. 33 00:02:13.140 --> 00:02:15.840 So we're gonna talk about the two main types, 34 00:02:15.840 --> 00:02:19.743 non-probability and probability. 35 00:02:25.669 --> 00:02:28.180 So the first type, non-probability 36 00:02:29.168 --> 00:02:32.490 means that you cannot state the probability 37 00:02:32.490 --> 00:02:35.913 of any individual being selected. 38 00:02:36.852 --> 00:02:39.240 That you can't say there's a one in 10 chance, 39 00:02:39.240 --> 00:02:41.520 there's a one in five chance, 40 00:02:41.520 --> 00:02:46.520 there's a 20% chance, anything like that. 41 00:02:48.508 --> 00:02:51.720 We use these, this is used actually very often 42 00:02:51.720 --> 00:02:56.720 in social science research when the number 43 00:02:57.060 --> 00:03:02.060 of knowledgeable subjects is small or hard to reach. 44 00:03:03.150 --> 00:03:07.863 So I use the example of NFL bands and fans in Belgium, 45 00:03:08.928 --> 00:03:12.180 or might be some well-known musical artist. 46 00:03:12.180 --> 00:03:16.020 Then rather than sort of going out 47 00:03:16.020 --> 00:03:21.020 and doing a representative sample of Belgium or of Vermont 48 00:03:21.540 --> 00:03:25.440 and to find out about this artist, 49 00:03:25.440 --> 00:03:30.440 it might make more sense to go to where this artist 50 00:03:30.540 --> 00:03:34.140 is playing and ask folks there 51 00:03:34.140 --> 00:03:36.930 because they're gonna know things about it 52 00:03:36.930 --> 00:03:39.570 and actually have experiences about that. 53 00:03:39.570 --> 00:03:44.220 In many cases too, they might have specialized knowledge, 54 00:03:44.220 --> 00:03:49.110 and in some cases it's done for convenience. 55 00:03:49.110 --> 00:03:52.950 It can just be the most convenient, the easiest, 56 00:03:52.950 --> 00:03:56.613 the low cost, and sometimes that's the best way. 57 00:03:59.040 --> 00:04:04.040 So here are four types of non-probability sampling. 58 00:04:05.010 --> 00:04:10.010 I note that they are not mutually exclusive, 59 00:04:10.890 --> 00:04:14.700 that they don't like clear boxes of this Os and this, 60 00:04:14.700 --> 00:04:19.323 that there will be overlap among them. 61 00:04:21.330 --> 00:04:25.113 So, first is convenience sampling. 62 00:04:26.386 --> 00:04:28.050 And this is where you just sort of go 63 00:04:28.050 --> 00:04:30.993 and talk to people where they may be, 64 00:04:32.160 --> 00:04:36.960 sort of the person on the street perspective, 65 00:04:36.960 --> 00:04:38.520 go to the library 66 00:04:38.520 --> 00:04:43.110 or go to a dining hall to talk to students, 67 00:04:43.110 --> 00:04:45.033 'cause that is where they are. 68 00:04:45.870 --> 00:04:48.540 If you're interested in market research 69 00:04:48.540 --> 00:04:50.310 on Fair Trade coffee, you might go 70 00:04:50.310 --> 00:04:52.423 to a specialty coffee shop. 71 00:04:57.390 --> 00:05:01.770 You wanna go to where consumers will be 72 00:05:01.770 --> 00:05:04.893 that are likely to want your product. 73 00:05:08.850 --> 00:05:11.970 Next, is purposive or judgmental. 74 00:05:11.970 --> 00:05:15.723 And this is where you as the researcher, 75 00:05:17.730 --> 00:05:22.050 seek out and find subjects that you think have information 76 00:05:22.050 --> 00:05:26.730 that you want that'll be valuable, that'll be useful. 77 00:05:26.730 --> 00:05:30.750 In many cases, we call these key informants, 78 00:05:30.750 --> 00:05:33.633 people who know about a certain subject. 79 00:05:35.073 --> 00:05:39.407 So if you wanna know about, again, a certain artist, 80 00:05:41.280 --> 00:05:45.033 you may wanna go talk to their fans. 81 00:05:46.200 --> 00:05:51.200 If you wanna know about a certain novel kind of food, 82 00:05:54.240 --> 00:05:58.830 go find people who eat it or who make, et cetera. 83 00:05:58.830 --> 00:06:03.830 When we do this, it is important to the extent that we can 84 00:06:04.620 --> 00:06:08.733 to use what we call maximum variability sampling. 85 00:06:09.804 --> 00:06:14.804 And that is to think about what are the dimensions 86 00:06:14.880 --> 00:06:18.150 of variability in the subjects. 87 00:06:18.150 --> 00:06:21.750 And you might wanna think about gender, age, income, 88 00:06:21.750 --> 00:06:25.920 geography, and not just talk to the same types of folks, 89 00:06:25.920 --> 00:06:29.100 so not just talk about, sort of talk to males 90 00:06:29.100 --> 00:06:33.120 who are older and wealthier and live in Shelburne, 91 00:06:33.120 --> 00:06:36.300 that you wanna sort of get a range of gender, age, 92 00:06:36.300 --> 00:06:41.300 income and variables like that. 93 00:06:43.560 --> 00:06:46.620 You also may wish to seek outliers. 94 00:06:46.620 --> 00:06:50.760 Find folks that you think will have a different 95 00:06:50.760 --> 00:06:54.213 or unique opinion than don't see it in that way, 96 00:06:55.890 --> 00:07:00.890 so that you get a much like broader range of views. 97 00:07:04.290 --> 00:07:08.010 Next is snowball, and this is called, 98 00:07:08.010 --> 00:07:11.640 it's like if you make a small snowball 99 00:07:11.640 --> 00:07:13.890 and roll it in the snow, 100 00:07:13.890 --> 00:07:17.370 and that's how you make a snowman of course. 101 00:07:17.370 --> 00:07:19.080 It gets bigger and bigger. 102 00:07:19.080 --> 00:07:24.080 And in this way, once you find somebody who's knowledgeable, 103 00:07:24.900 --> 00:07:27.270 you can ask them, who else knows about this, 104 00:07:27.270 --> 00:07:28.710 who should I talk to? 105 00:07:28.710 --> 00:07:33.710 Many times the last question I ask on an interview is, 106 00:07:33.990 --> 00:07:36.990 are there others that you think that you could recommend 107 00:07:36.990 --> 00:07:39.273 that are knowledgeable, that I could talk to? 108 00:07:40.680 --> 00:07:43.540 This is especially valuable when 109 00:07:43.540 --> 00:07:47.400 those key informants are hard to find. 110 00:07:47.400 --> 00:07:50.340 It's a good way of identifying networks. 111 00:07:50.340 --> 00:07:54.690 And I've used this in some of the supply chain work 112 00:07:54.690 --> 00:07:59.010 where I ask from ask folks, who do you buy from, 113 00:07:59.010 --> 00:08:00.930 and then who do you sell to? 114 00:08:00.930 --> 00:08:05.070 And in that way, that's a good way to find other subjects 115 00:08:05.070 --> 00:08:06.963 that are knowledgeable about this. 116 00:08:09.540 --> 00:08:14.540 The last non-probability method is a quota 117 00:08:16.190 --> 00:08:19.703 where you think about what are the characteristics 118 00:08:22.530 --> 00:08:27.327 that you want and you sort of want a certain percentage 119 00:08:28.827 --> 00:08:31.830 of genders, of ages, of ethnicities, 120 00:08:31.830 --> 00:08:36.830 things like liberals or conservatives, 121 00:08:37.170 --> 00:08:39.060 believers or skeptics. 122 00:08:39.060 --> 00:08:40.770 When you're interviewing businesses, 123 00:08:40.770 --> 00:08:44.973 things like different sizes, different industries, 124 00:08:46.971 --> 00:08:51.393 number of employees, experience and years in business. 125 00:08:52.950 --> 00:08:57.780 You think about I wanna have at least one or at least some 126 00:08:57.780 --> 00:09:02.780 or a certain percentage of my respondents sort of falling 127 00:09:03.900 --> 00:09:08.410 into one or more of these categories 128 00:09:09.267 --> 00:09:12.180 and sort of sampling until you find that, 129 00:09:12.180 --> 00:09:14.883 until you sort of meet those quotas. 130 00:09:18.709 --> 00:09:21.090 So to discuss this, again, it's the idea 131 00:09:21.090 --> 00:09:25.770 of maximum variability sampling that you want to make sure 132 00:09:25.770 --> 00:09:28.650 that you sample from a breadth of folks 133 00:09:28.650 --> 00:09:31.260 that have a diversity of opinion 134 00:09:31.260 --> 00:09:33.663 and experience and viewpoints. 135 00:09:36.915 --> 00:09:38.940 And a good question to begin with 136 00:09:38.940 --> 00:09:42.340 is what are the dimensions of variability 137 00:09:43.814 --> 00:09:46.020 and how do you make sure that you include folks 138 00:09:46.020 --> 00:09:51.020 that sort of represent different sort of points 139 00:09:51.600 --> 00:09:55.080 along those various dimensions. 140 00:09:55.080 --> 00:09:58.080 And when we start to collect data, 141 00:09:58.080 --> 00:10:00.300 I'm going to have you start to think about 142 00:10:00.300 --> 00:10:03.490 what are some of the important dimensions 143 00:10:04.765 --> 00:10:06.315 of variability of UVM students? 144 00:10:08.370 --> 00:10:10.650 The attributes may be their major, 145 00:10:10.650 --> 00:10:14.820 their class rank, where they live, race, 146 00:10:14.820 --> 00:10:19.820 gender, ethnicity, things like that, 147 00:10:19.890 --> 00:10:24.010 that will be important for the research question 148 00:10:26.620 --> 00:10:28.023 and topic at hand. 149 00:10:32.332 --> 00:10:37.332 So to sum up, non-probability, sometimes it's the only way, 150 00:10:40.170 --> 00:10:42.120 sometimes it's the best way. 151 00:10:42.120 --> 00:10:44.520 But one thing that you can't generally do 152 00:10:44.520 --> 00:10:48.487 is you cannot generalize, that you can't say, 153 00:10:48.487 --> 00:10:52.590 "Well, these folks hold this view, 154 00:10:52.590 --> 00:10:55.617 so therefore everybody does." 155 00:10:56.643 --> 00:11:00.210 And that can bring sort of, and by not having a non 156 00:11:00.210 --> 00:11:05.210 or by having a non-generalizable non-probability sample 157 00:11:06.630 --> 00:11:10.770 is there may be selection biases 158 00:11:10.770 --> 00:11:13.803 where you're only talking to a certain group. 159 00:11:14.910 --> 00:11:19.910 When you sort of like go to a farmer's market 160 00:11:23.429 --> 00:11:25.310 and have a table or things like that, 161 00:11:25.310 --> 00:11:27.960 or you have folks to sort of stop by, 162 00:11:27.960 --> 00:11:30.030 more likely you're gonna have folks 163 00:11:30.030 --> 00:11:32.763 that have a strong opinion one way or the other. 164 00:11:33.600 --> 00:11:35.640 Same with those who sort of call a show 165 00:11:35.640 --> 00:11:37.290 or take an online poll, 166 00:11:37.290 --> 00:11:40.083 but those folks tend to have stronger opinions, 167 00:11:41.057 --> 00:11:42.980 and so you really can't generalize. 168 00:11:46.860 --> 00:11:51.860 And certainly if you only talk to viewers of Fox News 169 00:11:52.080 --> 00:11:57.080 or only talk to viewers of MSNBC that you cannot generalize 170 00:12:01.440 --> 00:12:06.440 as to the views and opinions that are expressed there 171 00:12:06.510 --> 00:12:11.510 and say that's how all Americans feel. 172 00:12:15.510 --> 00:12:17.103 I think it's important though, 173 00:12:18.817 --> 00:12:22.953 do not worship at the altar of probability sampling. 174 00:12:26.430 --> 00:12:31.430 Sometimes a probability makes sense because again, 175 00:12:32.070 --> 00:12:37.070 you get to generalize and avoid those selection bias. 176 00:12:40.860 --> 00:12:45.210 But there are many, many times when a non-probability sample 177 00:12:45.210 --> 00:12:46.533 will be more so. 178 00:12:47.533 --> 00:12:50.490 And I invite you to think about when that might be. 179 00:12:50.490 --> 00:12:53.557 And I also really like this quotation, 180 00:12:53.557 --> 00:12:56.850 "All models are wrong, some models are useful," 181 00:12:56.850 --> 00:13:00.500 so sometimes to get more useful information, 182 00:13:00.500 --> 00:13:05.500 it makes sense to use a non-probability sample. 183 00:13:10.110 --> 00:13:14.110 So the opposite of a non-probability sample 184 00:13:14.993 --> 00:13:16.560 is then a probability sample. 185 00:13:16.560 --> 00:13:21.060 Here you do know the probability 186 00:13:21.060 --> 00:13:24.540 of being selected into the sample. 187 00:13:24.540 --> 00:13:29.540 So if you have a, so if my class has 50 students 188 00:13:30.780 --> 00:13:34.410 and I choose five to take a survey, 189 00:13:34.410 --> 00:13:39.410 sort of put all of your names into a hat 190 00:13:39.420 --> 00:13:42.480 and pull out five, that you have a one in 10 chance. 191 00:13:42.480 --> 00:13:45.463 So, you know the probability. 192 00:13:49.380 --> 00:13:52.467 And that's an example of a random selection. 193 00:13:55.350 --> 00:14:00.350 Just pull names, sort of pulling names out of a hat. 194 00:14:00.390 --> 00:14:05.347 And one way that's very commonly used is this 195 00:14:07.628 --> 00:14:11.643 where everyone has an equal chance of being selected. 196 00:14:14.580 --> 00:14:16.980 And this is called the equal probability 197 00:14:16.980 --> 00:14:18.603 of selection method. 198 00:14:19.650 --> 00:14:22.740 And while no sample is perfect, 199 00:14:22.740 --> 00:14:26.670 it does both eliminate the obvious biases 200 00:14:26.670 --> 00:14:28.410 like selection bias. 201 00:14:28.410 --> 00:14:31.800 And it can also, you can measure the degree 202 00:14:31.800 --> 00:14:36.147 to which the sample looks like the population 203 00:14:38.082 --> 00:14:40.840 and you can generalize with much more 204 00:14:43.200 --> 00:14:46.860 sort of confidence that the views 205 00:14:46.860 --> 00:14:51.753 of your sample reflect the larger population. 206 00:14:55.950 --> 00:14:59.190 When we say that a sample is representative, 207 00:14:59.190 --> 00:15:03.390 it means that it looks like the population, 208 00:15:03.390 --> 00:15:07.470 that it sort of has similar percentages 209 00:15:07.470 --> 00:15:12.470 of attributes as the overall population. 210 00:15:15.266 --> 00:15:18.430 And this does allow for generalizability 211 00:15:22.500 --> 00:15:25.120 that if you have a large enough sample 212 00:15:26.032 --> 00:15:28.380 that you can say with a good amount of confidence 213 00:15:28.380 --> 00:15:33.380 that sort of everybody feels the way 214 00:15:34.319 --> 00:15:37.980 that the sample responded. 215 00:15:37.980 --> 00:15:41.160 One thing that you need here is that you need a list 216 00:15:41.160 --> 00:15:43.263 of all possible subjects. 217 00:15:44.610 --> 00:15:49.610 And here's two very similar ways, 218 00:15:50.370 --> 00:15:53.943 the simple random sampling and systematic sampling. 219 00:15:55.200 --> 00:15:59.160 So in random sampling, you just give everybody a number 220 00:15:59.160 --> 00:16:03.690 and then you use a random number generator from a computer 221 00:16:03.690 --> 00:16:06.663 or from a table and just use those hooks. 222 00:16:10.871 --> 00:16:14.853 And if you're using the systematic method, 223 00:16:17.670 --> 00:16:20.403 you select every Nth unit. 224 00:16:25.438 --> 00:16:26.790 So you make a long numbered list, 225 00:16:26.790 --> 00:16:28.890 you pick the first one, one at random, 226 00:16:28.890 --> 00:16:31.890 and then you see how many that you need. 227 00:16:31.890 --> 00:16:35.800 So if you have a population of 10,000 228 00:16:36.911 --> 00:16:39.210 and you need 1,000 respondents, 229 00:16:39.210 --> 00:16:44.210 you start at a random point and then just sample every 10th 230 00:16:44.400 --> 00:16:46.713 until you get to who you need. 231 00:16:51.287 --> 00:16:56.287 So we will look at this in class, but this website 232 00:16:59.730 --> 00:17:04.230 tells you how large of a subject do you need 233 00:17:04.230 --> 00:17:06.360 to be able to generalize. 234 00:17:06.360 --> 00:17:11.100 And it depends on two things that we're gonna talk about 235 00:17:11.100 --> 00:17:14.610 here in a few minutes. 236 00:17:14.610 --> 00:17:18.660 The confidence level, how certain do you wanna be, 237 00:17:18.660 --> 00:17:20.843 you wanna be 90% sure, you wanna be 95% sure, 238 00:17:22.947 --> 00:17:25.713 you wanna be 99% sure. 239 00:17:25.713 --> 00:17:29.550 And the confidence interval, thinking about like plus 240 00:17:29.550 --> 00:17:32.433 or minus how many percentage points, 241 00:17:34.080 --> 00:17:38.190 sort of how close do we need to be? 242 00:17:38.190 --> 00:17:41.987 And it also depends on population size. 243 00:17:45.390 --> 00:17:50.390 So a larger population needs a larger sample 244 00:17:53.070 --> 00:17:58.053 as well as how homogeneous are the answers. 245 00:17:59.520 --> 00:18:02.050 Now I'm gonna give you some vocabulary 246 00:18:03.248 --> 00:18:07.320 that describe how certain you can be 247 00:18:07.320 --> 00:18:10.863 that your sample is generalizable. 248 00:18:13.140 --> 00:18:16.260 So, first we're gonna talk about parameters. 249 00:18:16.260 --> 00:18:21.260 That is the true value of the entire population. 250 00:18:21.270 --> 00:18:24.880 That if you sampled everyone 251 00:18:26.513 --> 00:18:29.070 and you got everybody's response that you would know 252 00:18:29.070 --> 00:18:32.430 that that is the true response. 253 00:18:32.430 --> 00:18:35.133 And it might be expressed as median, mean or mode, 254 00:18:36.254 --> 00:18:38.640 it might be expressed as the frequencies. 255 00:18:38.640 --> 00:18:42.976 And just as made up example, say that we talked 256 00:18:42.976 --> 00:18:45.390 to every single UVM student 257 00:18:45.390 --> 00:18:48.780 and found that 65% own an iPhone. 258 00:18:48.780 --> 00:18:51.746 Again, totally made up example, 259 00:18:51.746 --> 00:18:54.303 but just to give you an idea. 260 00:18:56.070 --> 00:19:01.070 The statistic is the measure of the sample. 261 00:19:02.983 --> 00:19:07.983 It's the estimated mean or the estimated frequency. 262 00:19:10.343 --> 00:19:13.497 It's the estimate of the true parameter, 263 00:19:16.471 --> 00:19:20.070 the true value that we make 264 00:19:20.070 --> 00:19:23.880 by drawing a sample and asking them. 265 00:19:23.880 --> 00:19:25.800 And let's say that we draw a sample 266 00:19:25.800 --> 00:19:30.300 and say they find a 68% of those 267 00:19:30.300 --> 00:19:33.690 that we talk to own an iPhone. 268 00:19:33.690 --> 00:19:37.473 So in this case it's off by three percentage points. 269 00:19:41.342 --> 00:19:43.500 The sampling error is the difference 270 00:19:43.500 --> 00:19:46.533 between the parameter and the statistic. 271 00:19:47.653 --> 00:19:50.403 And it's the error incurred by not talking to everyone. 272 00:19:51.705 --> 00:19:54.093 And it depends on three things. 273 00:19:55.263 --> 00:19:57.780 The larger the N the less the error, 274 00:19:57.780 --> 00:20:02.130 the more diverse the population, 275 00:20:02.130 --> 00:20:06.060 the sort of the greater range of answers 276 00:20:06.060 --> 00:20:10.710 the greater the error and the confidence level 277 00:20:12.706 --> 00:20:17.706 that the more precision that you need, 278 00:20:18.030 --> 00:20:20.290 the more that you need to narrow in 279 00:20:21.649 --> 00:20:23.913 the greater the chance of error. 280 00:20:26.160 --> 00:20:30.130 So, here's a very simple formula 281 00:20:32.326 --> 00:20:34.623 and it's for a binary yes or no. 282 00:20:37.035 --> 00:20:38.135 Like do you own a bike 283 00:20:40.677 --> 00:20:42.527 or did you vote in the last election? 284 00:20:43.714 --> 00:20:44.903 And it's the square root. 285 00:20:46.078 --> 00:20:50.430 And here p is the proportion who say yes, 286 00:20:50.430 --> 00:20:53.100 and 1-p is the proportion who say no. 287 00:20:53.100 --> 00:20:55.680 So it's the square root of two times p 288 00:20:55.680 --> 00:20:59.193 times 1-p all divided by N. 289 00:21:00.284 --> 00:21:04.697 So you see that as N gets larger and larger, 290 00:21:06.930 --> 00:21:11.313 your sampling error goes down and as, 291 00:21:12.151 --> 00:21:14.850 and if everybody says the same thing, 292 00:21:14.850 --> 00:21:17.890 like are you a UVM student 293 00:21:18.943 --> 00:21:21.603 or are you a carbon based life form, 294 00:21:22.895 --> 00:21:24.870 that the sampling error would be zero. 295 00:21:24.870 --> 00:21:29.870 And the greatest sampling error would be when P equals 0.5, 296 00:21:34.117 --> 00:21:37.350 that when they're sort of evenly distributed, 297 00:21:37.350 --> 00:21:41.490 same numbers across the two responses. 298 00:21:41.490 --> 00:21:43.323 That's the greatest error. 299 00:21:47.790 --> 00:21:51.300 So now we're gonna talk about 300 00:21:51.300 --> 00:21:55.627 how certain can you be that are that the statistic 301 00:22:00.090 --> 00:22:04.530 lies within a certain range of the parameter. 302 00:22:04.530 --> 00:22:07.800 And it depends on how certain you wanna be 303 00:22:07.800 --> 00:22:10.083 and how precise that you wanna be. 304 00:22:13.271 --> 00:22:16.080 So the Vermonter poll, which is an annual sort 305 00:22:16.080 --> 00:22:20.590 of public opinion bold, done by UVM Center for Rural Studies 306 00:22:21.510 --> 00:22:25.020 is a so-called 95/5 poll. 307 00:22:25.020 --> 00:22:30.020 And it's done so that they are 90 for 5% certain 308 00:22:30.420 --> 00:22:35.420 that the statistic that they measure is within plus 309 00:22:37.560 --> 00:22:40.623 or minus five percentage points of the true value. 310 00:22:41.922 --> 00:22:45.540 So if, again, hypothetical example, 65% say 311 00:22:45.540 --> 00:22:50.540 that creating jobs is the greatest priority in the state. 312 00:22:53.151 --> 00:22:57.040 That we're 95% sure that the true value lies between 60 313 00:22:58.364 --> 00:23:01.413 and 70 plus or minus five points. 314 00:23:02.580 --> 00:23:07.470 And the more precise that we wanna be, 315 00:23:07.470 --> 00:23:12.470 if we wanna be 99% sure then it's going to be greater 316 00:23:14.160 --> 00:23:18.360 than five percentage points plus or minus. 317 00:23:21.720 --> 00:23:26.110 But if we only need to be 90% sure it can be less 318 00:23:27.050 --> 00:23:30.201 than five persistent percentage points. 319 00:23:30.201 --> 00:23:31.410 And see if that makes sense 320 00:23:31.410 --> 00:23:35.553 and let's discuss that as needed in class. 321 00:23:38.549 --> 00:23:42.189 So again, we learned about confidence levels 322 00:23:42.189 --> 00:23:47.189 and confidence intervals and in class we'll do an exercise 323 00:23:48.060 --> 00:23:51.843 of how many that we need for say a 95/5 poll. 324 00:23:53.370 --> 00:23:54.573 That's what we just did. 325 00:23:55.470 --> 00:23:56.610 Thank you for watching. 326 00:23:56.610 --> 00:23:57.443 Have a good day.