1
00:00:00,990 --> 00:00:03,690
[Instructor] So this is
a really cool online app

2
00:00:03,690 --> 00:00:05,880
that helps us see just how powerful

3
00:00:05,880 --> 00:00:08,130
the central limit theorem can be.

4
00:00:08,130 --> 00:00:10,440
So we start in this top graph here

5
00:00:10,440 --> 00:00:12,450
and we can choose the parent population

6
00:00:12,450 --> 00:00:14,040
that we want to look at.

7
00:00:14,040 --> 00:00:16,470
We could choose a normal distribution,

8
00:00:16,470 --> 00:00:19,200
or a skewed distribution,

9
00:00:19,200 --> 00:00:21,030
or we can create our own distribution.

10
00:00:21,030 --> 00:00:21,990
And that's what we're gonna do

11
00:00:21,990 --> 00:00:24,030
to create something completely wacky

12
00:00:24,030 --> 00:00:26,820
to see how the central
limit theorem will work.

13
00:00:26,820 --> 00:00:31,388
So let's just draw a distribution
that's totally crazy.

14
00:00:31,388 --> 00:00:33,840
It's something like that,
and that's a big spike here,

15
00:00:33,840 --> 00:00:36,720
we go down here, another spike,

16
00:00:36,720 --> 00:00:38,613
and some more data like this.

17
00:00:40,147 --> 00:00:42,990
So that's a very non-normal distribution.

18
00:00:42,990 --> 00:00:46,560
Now, if we want to draw a
sample from this distribution

19
00:00:46,560 --> 00:00:51,560
of, say, N=5, we draw five
pieces of sample data,

20
00:00:53,100 --> 00:00:55,890
and from that, we have a sample mean.

21
00:00:55,890 --> 00:00:59,040
So we do this again,
five new pieces of data,

22
00:00:59,040 --> 00:01:01,410
and from that, we create another mean.

23
00:01:01,410 --> 00:01:04,703
We do that a bunch more times.

24
00:01:04,703 --> 00:01:07,260
So say another 5, 10, 15 times.

25
00:01:07,260 --> 00:01:09,540
Say we do it 10,000 times,

26
00:01:09,540 --> 00:01:11,700
now we get a distribution
of the sample means

27
00:01:11,700 --> 00:01:13,050
that looks like this.

28
00:01:13,050 --> 00:01:15,030
And if we fit a normal curve,

29
00:01:15,030 --> 00:01:18,270
we can see that it's
getting really very normal.

30
00:01:18,270 --> 00:01:20,010
If we go over to the sample statistics,

31
00:01:20,010 --> 00:01:21,780
we can see that the mean

32
00:01:21,780 --> 00:01:25,140
of our original distribution was 15.46,

33
00:01:25,140 --> 00:01:28,560
and the mean of our distribution
of the sampling means

34
00:01:28,560 --> 00:01:32,700
is also 15.4 something, so
it's getting pretty close.

35
00:01:32,700 --> 00:01:34,050
We take out our calculators

36
00:01:34,050 --> 00:01:36,480
and we look at the standard deviation.

37
00:01:36,480 --> 00:01:39,690
Our original standard deviation is 9.89,

38
00:01:39,690 --> 00:01:42,933
and if we divide that by
the square root of five,

39
00:01:44,160 --> 00:01:48,270
we get a 4.42 as a standard error

40
00:01:48,270 --> 00:01:49,980
of our sampling distribution,

41
00:01:49,980 --> 00:01:52,230
and that's really close
to what we have here.

42
00:01:53,070 --> 00:01:56,280
So even though we've got
this crazy wacky distribution

43
00:01:56,280 --> 00:01:59,250
to start out with, the
distribution of the sampling means

44
00:01:59,250 --> 00:02:02,340
is indeed approximately normal

45
00:02:02,340 --> 00:02:05,070
with a mean the same
as our population mean

46
00:02:05,070 --> 00:02:06,480
and the standard error

47
00:02:06,480 --> 00:02:09,870
the same as the population
standard deviation

48
00:02:09,870 --> 00:02:12,780
divided by the square
root of the sample size.

49
00:02:12,780 --> 00:02:15,244
So what if we change the sample size?

50
00:02:15,244 --> 00:02:18,060
So say instead of taking samples of N=5,

51
00:02:18,060 --> 00:02:21,090
we take samples of N=25.

52
00:02:21,090 --> 00:02:22,770
So now, if we go through

53
00:02:22,770 --> 00:02:25,500
and now we're pulling 25 pieces of data

54
00:02:25,500 --> 00:02:27,580
from our original sampling distribution

55
00:02:28,560 --> 00:02:33,270
and we get a mean from
these data, there it is,

56
00:02:33,270 --> 00:02:37,740
say we do that five more times,
another five, another five,

57
00:02:37,740 --> 00:02:41,100
and now we do it 10,000
times like we did before,

58
00:02:41,100 --> 00:02:45,120
we see that we get another
almost normal distribution

59
00:02:45,120 --> 00:02:48,060
with the distribution
of the sampling means

60
00:02:48,060 --> 00:02:50,223
the same mean as the original,

61
00:02:51,420 --> 00:02:54,000
and the standard error
of the sampling means

62
00:02:54,000 --> 00:02:56,400
the same as the standard deviation

63
00:02:56,400 --> 00:03:01,183
from the population divided
by the square root of now 25.

64
00:03:01,183 --> 00:03:02,017
So let's do that.

65
00:03:02,017 --> 00:03:06,700
So 9.89 divided by the
square root of 25 is 1.98,

66
00:03:09,620 --> 00:03:11,490
which is really close to what we see here.

67
00:03:11,490 --> 00:03:14,130
So you can see that, again,
we have a normal distribution,

68
00:03:14,130 --> 00:03:16,590
but it's much tighter around the mean.

69
00:03:16,590 --> 00:03:19,110
So as we increase the size of our sample,

70
00:03:19,110 --> 00:03:22,290
we get a much more tightly
clustered estimate,

71
00:03:22,290 --> 00:03:24,270
a much more precise estimate

72
00:03:24,270 --> 00:03:27,183
of the distribution of the sampling means.