1 00:00:00,990 --> 00:00:03,690 [Instructor] So this is a really cool online app 2 00:00:03,690 --> 00:00:05,880 that helps us see just how powerful 3 00:00:05,880 --> 00:00:08,130 the central limit theorem can be. 4 00:00:08,130 --> 00:00:10,440 So we start in this top graph here 5 00:00:10,440 --> 00:00:12,450 and we can choose the parent population 6 00:00:12,450 --> 00:00:14,040 that we want to look at. 7 00:00:14,040 --> 00:00:16,470 We could choose a normal distribution, 8 00:00:16,470 --> 00:00:19,200 or a skewed distribution, 9 00:00:19,200 --> 00:00:21,030 or we can create our own distribution. 10 00:00:21,030 --> 00:00:21,990 And that's what we're gonna do 11 00:00:21,990 --> 00:00:24,030 to create something completely wacky 12 00:00:24,030 --> 00:00:26,820 to see how the central limit theorem will work. 13 00:00:26,820 --> 00:00:31,388 So let's just draw a distribution that's totally crazy. 14 00:00:31,388 --> 00:00:33,840 It's something like that, and that's a big spike here, 15 00:00:33,840 --> 00:00:36,720 we go down here, another spike, 16 00:00:36,720 --> 00:00:38,613 and some more data like this. 17 00:00:40,147 --> 00:00:42,990 So that's a very non-normal distribution. 18 00:00:42,990 --> 00:00:46,560 Now, if we want to draw a sample from this distribution 19 00:00:46,560 --> 00:00:51,560 of, say, N=5, we draw five pieces of sample data, 20 00:00:53,100 --> 00:00:55,890 and from that, we have a sample mean. 21 00:00:55,890 --> 00:00:59,040 So we do this again, five new pieces of data, 22 00:00:59,040 --> 00:01:01,410 and from that, we create another mean. 23 00:01:01,410 --> 00:01:04,703 We do that a bunch more times. 24 00:01:04,703 --> 00:01:07,260 So say another 5, 10, 15 times. 25 00:01:07,260 --> 00:01:09,540 Say we do it 10,000 times, 26 00:01:09,540 --> 00:01:11,700 now we get a distribution of the sample means 27 00:01:11,700 --> 00:01:13,050 that looks like this. 28 00:01:13,050 --> 00:01:15,030 And if we fit a normal curve, 29 00:01:15,030 --> 00:01:18,270 we can see that it's getting really very normal. 30 00:01:18,270 --> 00:01:20,010 If we go over to the sample statistics, 31 00:01:20,010 --> 00:01:21,780 we can see that the mean 32 00:01:21,780 --> 00:01:25,140 of our original distribution was 15.46, 33 00:01:25,140 --> 00:01:28,560 and the mean of our distribution of the sampling means 34 00:01:28,560 --> 00:01:32,700 is also 15.4 something, so it's getting pretty close. 35 00:01:32,700 --> 00:01:34,050 We take out our calculators 36 00:01:34,050 --> 00:01:36,480 and we look at the standard deviation. 37 00:01:36,480 --> 00:01:39,690 Our original standard deviation is 9.89, 38 00:01:39,690 --> 00:01:42,933 and if we divide that by the square root of five, 39 00:01:44,160 --> 00:01:48,270 we get a 4.42 as a standard error 40 00:01:48,270 --> 00:01:49,980 of our sampling distribution, 41 00:01:49,980 --> 00:01:52,230 and that's really close to what we have here. 42 00:01:53,070 --> 00:01:56,280 So even though we've got this crazy wacky distribution 43 00:01:56,280 --> 00:01:59,250 to start out with, the distribution of the sampling means 44 00:01:59,250 --> 00:02:02,340 is indeed approximately normal 45 00:02:02,340 --> 00:02:05,070 with a mean the same as our population mean 46 00:02:05,070 --> 00:02:06,480 and the standard error 47 00:02:06,480 --> 00:02:09,870 the same as the population standard deviation 48 00:02:09,870 --> 00:02:12,780 divided by the square root of the sample size. 49 00:02:12,780 --> 00:02:15,244 So what if we change the sample size? 50 00:02:15,244 --> 00:02:18,060 So say instead of taking samples of N=5, 51 00:02:18,060 --> 00:02:21,090 we take samples of N=25. 52 00:02:21,090 --> 00:02:22,770 So now, if we go through 53 00:02:22,770 --> 00:02:25,500 and now we're pulling 25 pieces of data 54 00:02:25,500 --> 00:02:27,580 from our original sampling distribution 55 00:02:28,560 --> 00:02:33,270 and we get a mean from these data, there it is, 56 00:02:33,270 --> 00:02:37,740 say we do that five more times, another five, another five, 57 00:02:37,740 --> 00:02:41,100 and now we do it 10,000 times like we did before, 58 00:02:41,100 --> 00:02:45,120 we see that we get another almost normal distribution 59 00:02:45,120 --> 00:02:48,060 with the distribution of the sampling means 60 00:02:48,060 --> 00:02:50,223 the same mean as the original, 61 00:02:51,420 --> 00:02:54,000 and the standard error of the sampling means 62 00:02:54,000 --> 00:02:56,400 the same as the standard deviation 63 00:02:56,400 --> 00:03:01,183 from the population divided by the square root of now 25. 64 00:03:01,183 --> 00:03:02,017 So let's do that. 65 00:03:02,017 --> 00:03:06,700 So 9.89 divided by the square root of 25 is 1.98, 66 00:03:09,620 --> 00:03:11,490 which is really close to what we see here. 67 00:03:11,490 --> 00:03:14,130 So you can see that, again, we have a normal distribution, 68 00:03:14,130 --> 00:03:16,590 but it's much tighter around the mean. 69 00:03:16,590 --> 00:03:19,110 So as we increase the size of our sample, 70 00:03:19,110 --> 00:03:22,290 we get a much more tightly clustered estimate, 71 00:03:22,290 --> 00:03:24,270 a much more precise estimate 72 00:03:24,270 --> 00:03:27,183 of the distribution of the sampling means.