WEBVTT
1
00:00:03.480 --> 00:00:05.280
Hello, students.
2
00:00:05.280 --> 00:00:08.643
We are now going to get problem 20.
3
00:00:11.100 --> 00:00:12.630
This is, again, a new problem
4
00:00:12.630 --> 00:00:15.150
and here we have the raw dataset.
5
00:00:15.150 --> 00:00:18.510
The following data are a sample of white blood cell counts
6
00:00:18.510 --> 00:00:22.470
in thousands of cells per cubic millimeter for nine patients
7
00:00:22.470 --> 00:00:26.883
entering a hospital in Boston, Massachusetts on a given day.
8
00:00:28.110 --> 00:00:29.730
So, the question is asking us
9
00:00:29.730 --> 00:00:32.430
are there any outliers in this dataset?
10
00:00:32.430 --> 00:00:35.010
And of course, we have to justify our answer.
11
00:00:35.010 --> 00:00:36.600
So, what do we have to do here?
12
00:00:36.600 --> 00:00:38.430
As you may have understood
13
00:00:38.430 --> 00:00:41.010
from our previous problems that we have solved,
14
00:00:41.010 --> 00:00:43.653
we need to compute the quartiles.
15
00:00:44.790 --> 00:00:47.370
So, we have to first compute the quartiles
16
00:00:47.370 --> 00:00:49.800
and kind of make a plan for this.
17
00:00:49.800 --> 00:00:51.690
We have to compute the quartiles
18
00:00:51.690 --> 00:00:56.050
and then we have to calculate the lower
19
00:00:57.150 --> 00:01:01.180
and our limits, right, for the two (indistinct).
20
00:01:01.180 --> 00:01:02.523
Okay, now,
21
00:01:04.230 --> 00:01:06.540
before we can calculate the quartiles
22
00:01:06.540 --> 00:01:07.950
what's the first thing we have to do?
23
00:01:07.950 --> 00:01:09.900
We need the ordered set.
24
00:01:09.900 --> 00:01:14.250
So, again here I'm going to give you the ordered set.
25
00:01:14.250 --> 00:01:18.900
Again, very easily we can use Excel and calculate this.
26
00:01:18.900 --> 00:01:20.130
So, let's see here
27
00:01:20.130 --> 00:01:25.130
is 3, 5, 7,
28
00:01:26.940 --> 00:01:31.383
8, 8, 9,
29
00:01:32.730 --> 00:01:36.930
10, 12, and 35.
30
00:01:36.930 --> 00:01:39.540
Okay, so again, this is our ordered set.
31
00:01:39.540 --> 00:01:42.210
So, what I want to do is right now number them
32
00:01:42.210 --> 00:01:46.800
using another color so it is easier for us to see it.
33
00:01:46.800 --> 00:01:49.860
So, this is 1, this is 2, this is 3,
34
00:01:49.860 --> 00:01:52.260
this is 4, this is 5,
35
00:01:52.260 --> 00:01:56.010
6, 7, 8, and 9.
36
00:01:56.010 --> 00:01:58.500
And because, again, we have 9,
37
00:01:58.500 --> 00:02:00.300
which is an odd number,
38
00:02:00.300 --> 00:02:04.170
it's very clear that this is going to be our median.
39
00:02:04.170 --> 00:02:07.710
Okay, so the fifth place, which is number 8,
40
00:02:07.710 --> 00:02:09.523
is our median, and this is Q2.
41
00:02:10.470 --> 00:02:15.133
So, now the question is, what is Q1 and what is Q3?
42
00:02:16.020 --> 00:02:18.540
So, the Q1 will be, again,
43
00:02:18.540 --> 00:02:23.070
the median of the upper values,
44
00:02:23.070 --> 00:02:26.250
basically the values that are above our median
45
00:02:26.250 --> 00:02:29.823
and that will be, again, I'm going to use another color.
46
00:02:30.930 --> 00:02:33.900
So, that's basically this data set.
47
00:02:33.900 --> 00:02:35.910
And of this, what's in the middle?
48
00:02:35.910 --> 00:02:36.743
These two
49
00:02:38.220 --> 00:02:39.570
And because it's a even number,
50
00:02:39.570 --> 00:02:41.430
we need to make an average of these two.
51
00:02:41.430 --> 00:02:45.870
So, this is Q1, so it is five plus seven.
52
00:02:45.870 --> 00:02:49.200
And again, I always like to put a parenthesis around
53
00:02:49.200 --> 00:02:51.240
because that makes me realize
54
00:02:51.240 --> 00:02:54.210
I gotta do the addition before doing the division.
55
00:02:54.210 --> 00:02:56.760
So, this will be 12 divided by two.
56
00:02:56.760 --> 00:02:58.170
That is what, six?
57
00:02:58.170 --> 00:03:00.150
Correct, okay.
58
00:03:00.150 --> 00:03:01.920
Now we have to go to the lower part,
59
00:03:01.920 --> 00:03:04.200
and again I'm going to use another color.
60
00:03:04.200 --> 00:03:06.100
Let's see, I'm going to use
61
00:03:07.770 --> 00:03:09.698
maybe blue this time.
62
00:03:09.698 --> 00:03:13.860
Okay, so here this is basically the four values
63
00:03:13.860 --> 00:03:18.480
that are below
64
00:03:18.480 --> 00:03:20.700
our median, which is Q2.
65
00:03:20.700 --> 00:03:23.220
So, here again, what is going to be Q3?
66
00:03:23.220 --> 00:03:27.210
That's going to be the average of these two numbers,
67
00:03:27.210 --> 00:03:31.870
so Q3 is 10 plus 12
68
00:03:33.330 --> 00:03:35.010
divided by two.
69
00:03:35.010 --> 00:03:37.260
So, that will be what?
70
00:03:37.260 --> 00:03:38.490
That will be 11.
71
00:03:38.490 --> 00:03:41.160
Okay, so now what is our next step?
72
00:03:41.160 --> 00:03:43.980
Our next step is to calculate the IQR.
73
00:03:43.980 --> 00:03:45.580
So, what is going to be our IQR?
74
00:03:47.640 --> 00:03:51.040
Our IQR is going to be the difference between
75
00:03:55.470 --> 00:03:56.708
Q3 and Q1.
76
00:03:56.708 --> 00:03:59.353
So, IQR will be Q3 minus Q1,
77
00:04:02.820 --> 00:04:05.340
and that is going to be what?
78
00:04:05.340 --> 00:04:08.400
11 minus six, that is five.
79
00:04:08.400 --> 00:04:10.770
Correct. Okay, great.
80
00:04:10.770 --> 00:04:12.840
So, now what we need to do is calculate
81
00:04:12.840 --> 00:04:14.250
the lower limit and the upper limit.
82
00:04:14.250 --> 00:04:17.310
So, again, the lower limit,
83
00:04:17.310 --> 00:04:20.343
I'm going to again write the formulas.
84
00:04:21.960 --> 00:04:25.240
So, that will be Q1
85
00:04:26.940 --> 00:04:31.173
minus 1.5 multiplied by IQR.
86
00:04:32.340 --> 00:04:33.990
So, what is our Q1?
87
00:04:33.990 --> 00:04:35.520
We just calculated it.
88
00:04:35.520 --> 00:04:37.170
Q1 is 6,
89
00:04:37.170 --> 00:04:42.100
so it's going to be 6 minus 1.5
90
00:04:43.200 --> 00:04:44.850
multiplied by 5.
91
00:04:44.850 --> 00:04:46.260
So, it's going to be what?
92
00:04:46.260 --> 00:04:50.340
6 minus 7.5.
93
00:04:50.340 --> 00:04:51.690
So, what's gonna happen here?
94
00:04:51.690 --> 00:04:53.714
We're gonna have a negative number.
95
00:04:53.714 --> 00:04:56.550
1.5, correct?
96
00:04:56.550 --> 00:05:01.470
Yes, so now we are going to go to the upper limit.
97
00:05:01.470 --> 00:05:03.960
So, the upper limit, again, it seems like repetitive,
98
00:05:03.960 --> 00:05:06.960
but it's always a good idea to write the formula again.
99
00:05:06.960 --> 00:05:09.510
It always is. I love doing it, okay?
100
00:05:09.510 --> 00:05:14.510
So, Q3 plus 1.5
101
00:05:14.640 --> 00:05:16.207
multiplied by IQR.
102
00:05:18.000 --> 00:05:20.820
Okay, oops, that got a little, little.
103
00:05:20.820 --> 00:05:24.810
Okay, let me make sure it fits here.
104
00:05:24.810 --> 00:05:28.410
Okay, so I'm going to just erase this
105
00:05:28.410 --> 00:05:31.890
and write this whole thing so everything fits nicely.
106
00:05:31.890 --> 00:05:33.190
Okay, upper limit
107
00:05:35.700 --> 00:05:39.860
is formula is Q3
108
00:05:41.426 --> 00:05:46.080
plus 1.5 multiplied by IQR.
109
00:05:46.080 --> 00:05:49.113
Okay, so, again, remember from before.
110
00:05:52.050 --> 00:05:54.090
I'm going to again use a different color,
111
00:05:54.090 --> 00:05:55.920
maybe purple this time.
112
00:05:55.920 --> 00:05:58.320
This is going to be the same, right?
113
00:05:58.320 --> 00:06:01.710
So, this is going to be still 7.5
114
00:06:01.710 --> 00:06:04.170
so we don't need to do that calculation again,
115
00:06:04.170 --> 00:06:07.743
all we need to do is take that and add it to Q3.
116
00:06:10.170 --> 00:06:12.180
And this is how you can save time
117
00:06:12.180 --> 00:06:13.710
when you are taking the test
118
00:06:13.710 --> 00:06:16.800
by not repeating the calculations.
119
00:06:16.800 --> 00:06:19.020
Okay, that really saves time.
120
00:06:19.020 --> 00:06:20.023
So, what is our Q3?
121
00:06:21.880 --> 00:06:23.073
Q3 is 11.
122
00:06:24.390 --> 00:06:28.530
So, 11 plus now 7.5.
123
00:06:28.530 --> 00:06:32.250
Okay, so 11 plus 7.5 is going to be what?
124
00:06:32.250 --> 00:06:33.500
18.5.
125
00:06:35.339 --> 00:06:39.150
So, are there outliers?
126
00:06:39.150 --> 00:06:40.680
That's the question.
127
00:06:40.680 --> 00:06:42.300
Yes, there are, right?
128
00:06:42.300 --> 00:06:43.620
We do have outliers.
129
00:06:43.620 --> 00:06:44.880
How do we know that?
130
00:06:44.880 --> 00:06:49.833
Because 35 is bigger than 18.5.
131
00:06:51.120 --> 00:06:54.900
We don't have any outliers in terms of the lower limit
132
00:06:54.900 --> 00:06:58.170
because we don't have any value less than -1.5,
133
00:06:58.170 --> 00:07:01.950
but we do have a value that is greater than our upper limit,
134
00:07:01.950 --> 00:07:03.390
which is 18.5.
135
00:07:03.390 --> 00:07:05.613
So, again, we have an outlier.
136
00:07:07.080 --> 00:07:08.640
Let me use another nice color,
137
00:07:08.640 --> 00:07:10.350
let's use green this time.
138
00:07:10.350 --> 00:07:14.280
Okay, so 35 is actually,
139
00:07:14.280 --> 00:07:16.830
as we know, bigger than 18.5
140
00:07:16.830 --> 00:07:20.013
so yes, we have an outlier.
141
00:07:24.240 --> 00:07:28.823
Yes, outlier exists, okay?
142
00:07:33.090 --> 00:07:35.700
Because 35 is greater than 18.5,
143
00:07:35.700 --> 00:07:37.623
so that is our outlier.
144
00:07:39.390 --> 00:07:43.463
Okay, let me write it clearly.
145
00:07:43.463 --> 00:07:44.630
Okay, so 18.5.
146
00:07:48.240 --> 00:07:52.620
Okay, so we have done
147
00:07:52.620 --> 00:07:54.720
this calculation now a few times
148
00:07:54.720 --> 00:07:57.960
and I really, really encourage students
149
00:07:57.960 --> 00:08:02.190
to go through the calculations multiple times.
150
00:08:02.190 --> 00:08:04.050
The magic number for me has been three.
151
00:08:04.050 --> 00:08:05.790
If I go through a calculation three times,
152
00:08:05.790 --> 00:08:07.170
go through a problem three times
153
00:08:07.170 --> 00:08:10.710
even when I know the answer, I know the process,
154
00:08:10.710 --> 00:08:12.360
even then when I repeat that
155
00:08:12.360 --> 00:08:14.730
it really helps me because you know why?
156
00:08:14.730 --> 00:08:17.580
When you take a test, you are what?
157
00:08:17.580 --> 00:08:19.020
You have a time constraint
158
00:08:19.020 --> 00:08:21.630
so you have to go a little faster
159
00:08:21.630 --> 00:08:23.730
and that really is very helpful
160
00:08:23.730 --> 00:08:27.060
when you have solved a problem a few times
161
00:08:27.060 --> 00:08:31.170
then that process naturally gets speeded
162
00:08:31.170 --> 00:08:33.000
or sped up in your mind,
163
00:08:33.000 --> 00:08:34.800
you can do it much faster.
164
00:08:34.800 --> 00:08:35.640
You can think faster
165
00:08:35.640 --> 00:08:37.860
because you have gone through the process a few times.
166
00:08:37.860 --> 00:08:39.780
So, again, I very much encourage you
167
00:08:39.780 --> 00:08:41.897
to go through these problems a couple of times.
168
00:08:41.897 --> 00:08:44.370
Again, I'm showing you a number of problems,
169
00:08:44.370 --> 00:08:46.620
I'm putting a lot more examples this time
170
00:08:46.620 --> 00:08:49.290
because I think it's important to go through these examples
171
00:08:49.290 --> 00:08:52.230
so that will clear the understanding in your mind
172
00:08:52.230 --> 00:08:55.050
and you will also be able to solve the problems
173
00:08:55.050 --> 00:08:57.603
much faster when you take the test.