1 00:00:00,660 --> 00:00:03,310 - [Teacher] This is the last of the Module Three Lectures. 2 00:00:03,310 --> 00:00:05,100 In this lecture, we explore techniques 3 00:00:05,100 --> 00:00:08,250 through summarizing raster data within bounded regions. 4 00:00:08,250 --> 00:00:09,950 Think of this as the raster equivalent 5 00:00:09,950 --> 00:00:13,313 of the vector operation reviewed in the last lecture. 6 00:00:14,470 --> 00:00:16,470 Once again, we're faced with the same problem 7 00:00:16,470 --> 00:00:18,200 of determining how much of one thing 8 00:00:18,200 --> 00:00:21,530 is contained by another, or how much of each type 9 00:00:21,530 --> 00:00:23,943 of that one thing is contained by the other. 10 00:00:25,250 --> 00:00:27,150 First, a short step back to refresh 11 00:00:27,150 --> 00:00:29,193 what we know about zonal operations. 12 00:00:30,180 --> 00:00:32,320 Zonal operations summarize raster values 13 00:00:32,320 --> 00:00:34,940 within a bounding unit or zone. 14 00:00:34,940 --> 00:00:36,610 Those zone boundaries can be represented 15 00:00:36,610 --> 00:00:38,837 with other vector or raster data. 16 00:00:38,837 --> 00:00:41,370 But in this case, to remain consistent with the examples 17 00:00:41,370 --> 00:00:43,460 of the previous lecture, we'll use polygons 18 00:00:43,460 --> 00:00:45,700 to define our zone boundaries. 19 00:00:45,700 --> 00:00:47,800 It's critically important here that each zone 20 00:00:47,800 --> 00:00:49,840 feature a unique ID. 21 00:00:49,840 --> 00:00:53,120 In other words, there are no two discreet bounding polygons 22 00:00:53,120 --> 00:00:55,013 that have the same zone ID value. 23 00:00:55,890 --> 00:00:58,160 Note that you can use a raster dataset 24 00:00:58,160 --> 00:01:00,160 to represent the zones. 25 00:01:00,160 --> 00:01:01,790 But in this case, the zone is defined 26 00:01:01,790 --> 00:01:03,820 by the value of the pixel. 27 00:01:03,820 --> 00:01:06,410 All pixels in the dataset with the same value 28 00:01:06,410 --> 00:01:09,030 make up the zone, regardless of whether those cells 29 00:01:09,030 --> 00:01:10,780 are adjacent to one another or not. 30 00:01:11,670 --> 00:01:15,060 Zonal operations typically produce either a new table, 31 00:01:15,060 --> 00:01:17,430 certainly when you run zonal statistics as table 32 00:01:17,430 --> 00:01:19,320 or tabulate area. 33 00:01:19,320 --> 00:01:21,000 Note that the zonal statistics tool 34 00:01:21,000 --> 00:01:23,630 generates an output raster dataset, 35 00:01:23,630 --> 00:01:26,340 and it's also only able to compute a single statistic 36 00:01:26,340 --> 00:01:27,823 for each run of the tool. 37 00:01:29,630 --> 00:01:32,500 This slide breaks down the key details or parameters 38 00:01:32,500 --> 00:01:35,650 of the two primary approaches I'd like to focus on, 39 00:01:35,650 --> 00:01:38,603 zonal statistics as table and tabulate area. 40 00:01:39,510 --> 00:01:41,810 In the left column, we see the characteristics 41 00:01:41,810 --> 00:01:45,270 of the zonal statistics as table geo processing tool. 42 00:01:45,270 --> 00:01:48,240 As I mentioned earlier, input zones can be either vector 43 00:01:48,240 --> 00:01:52,760 or raster and must use a unique ID for the zone field. 44 00:01:52,760 --> 00:01:54,330 Hopefully you already know why, 45 00:01:54,330 --> 00:01:57,890 but in case not we'll reconsider that on the next slide. 46 00:01:57,890 --> 00:02:01,900 The input data should be a continuous raster dataset. 47 00:02:01,900 --> 00:02:04,520 I say should here, because there is a specific case 48 00:02:04,520 --> 00:02:06,530 where the thematic dataset could be used 49 00:02:06,530 --> 00:02:09,110 and the output will still make sense. 50 00:02:09,110 --> 00:02:12,170 If you want to summarize data from a continuous raster, 51 00:02:12,170 --> 00:02:15,500 zonal statistics as table is the correct tool for the job. 52 00:02:15,500 --> 00:02:17,990 The output table represents the summary statistics 53 00:02:17,990 --> 00:02:20,853 of the input raster for each of the input zones. 54 00:02:21,700 --> 00:02:24,080 What about tabulate area? 55 00:02:24,080 --> 00:02:26,270 The primary difference here is that the input data 56 00:02:26,270 --> 00:02:27,950 should be thematic. 57 00:02:27,950 --> 00:02:30,210 I say should here because the tool will work 58 00:02:30,210 --> 00:02:33,480 with continuous data, but the output table is likely 59 00:02:33,480 --> 00:02:34,670 to be a mess. 60 00:02:34,670 --> 00:02:37,260 If you're not sure why consider this again 61 00:02:37,260 --> 00:02:39,130 after you've reviewed the output 62 00:02:39,130 --> 00:02:41,500 of a tabulate area operation. 63 00:02:41,500 --> 00:02:44,640 You still need to specify a unique value for each zone ID 64 00:02:44,640 --> 00:02:46,740 and the output will once again be a table. 65 00:02:48,070 --> 00:02:51,980 So why do we need unique value for each zone ID? 66 00:02:51,980 --> 00:02:55,170 Look at the typical workflow for a zonal operation. 67 00:02:55,170 --> 00:02:56,950 You compute that summary operation 68 00:02:56,950 --> 00:02:59,060 and produce a tabular output. 69 00:02:59,060 --> 00:03:01,460 How do you plan to map your results? 70 00:03:01,460 --> 00:03:03,550 You're going to join that tabular output 71 00:03:03,550 --> 00:03:06,300 to the input zone dataset. 72 00:03:06,300 --> 00:03:09,650 That zone ID comes from the first step of what is commonly 73 00:03:09,650 --> 00:03:13,180 a three or more step geo processing workflow. 74 00:03:13,180 --> 00:03:16,450 Once the joint is implemented, you can query the result, 75 00:03:16,450 --> 00:03:19,480 apply custom symbology or calculate values 76 00:03:19,480 --> 00:03:22,660 of other attributes from the summary statistics. 77 00:03:22,660 --> 00:03:24,630 This is certainly not an exhaustive list 78 00:03:24,630 --> 00:03:27,330 of what might happen after step two. 79 00:03:27,330 --> 00:03:31,200 Computing the summary statistics is just the first step. 80 00:03:31,200 --> 00:03:34,210 Understanding the what and how of the operation 81 00:03:34,210 --> 00:03:36,900 are critical for further interrogation of the data 82 00:03:36,900 --> 00:03:39,650 in order to extract the information you're looking for. 83 00:03:41,190 --> 00:03:43,030 Let's return to our trail buffers example 84 00:03:43,030 --> 00:03:44,680 from the Vector Lecture. 85 00:03:44,680 --> 00:03:47,840 This time however, we'll use zonal statistics as table 86 00:03:47,840 --> 00:03:49,423 to make our calculations. 87 00:03:50,870 --> 00:03:54,160 We're once again using trail buffers shown in blue 88 00:03:54,160 --> 00:03:57,250 and town boundaries, gray with black outlines. 89 00:03:57,250 --> 00:04:00,473 I converted the polygon trail buffer data to raster data. 90 00:04:01,720 --> 00:04:04,300 If we zoom in a bit closer, we get confirmation 91 00:04:04,300 --> 00:04:06,780 that this is indeed a raster dataset. 92 00:04:06,780 --> 00:04:08,700 I can see that stair-step outer edge 93 00:04:08,700 --> 00:04:10,813 of what used to be the polygon boundaries. 94 00:04:12,910 --> 00:04:14,920 In this case, our input data, 95 00:04:14,920 --> 00:04:16,610 that raster that defines whether something 96 00:04:16,610 --> 00:04:19,710 is a trail buffer or not, takes on a value of zero, 97 00:04:19,710 --> 00:04:23,150 not a buffer or one as a buffer. 98 00:04:23,150 --> 00:04:24,930 Look to the right to evaluate the parameters 99 00:04:24,930 --> 00:04:27,710 of the zonal statistics as table operation. 100 00:04:27,710 --> 00:04:30,860 Not surprisingly, I'm going to use that FIPS six code 101 00:04:30,860 --> 00:04:32,053 as my zone ID. 102 00:04:33,580 --> 00:04:37,030 We run the tool to produce a table of summary statistics, 103 00:04:37,030 --> 00:04:39,170 one record for each of the FIPS six codes 104 00:04:39,170 --> 00:04:40,493 in my input zone data. 105 00:04:41,370 --> 00:04:43,530 At this point, you might be scratching your head saying, 106 00:04:43,530 --> 00:04:45,300 wait a minute, you just introduced 107 00:04:45,300 --> 00:04:48,230 zonal statistics as table as the appropriate tool 108 00:04:48,230 --> 00:04:50,600 for summarizing continuous raster data 109 00:04:50,600 --> 00:04:52,290 and then the first example you show us 110 00:04:52,290 --> 00:04:54,223 is not a continuous raster dataset. 111 00:04:55,100 --> 00:04:56,830 And you're absolutely right. 112 00:04:56,830 --> 00:04:59,730 This is the special case I alluded to earlier. 113 00:04:59,730 --> 00:05:02,220 In this instance, it's important to remember that a pixel 114 00:05:02,220 --> 00:05:04,860 in my input dataset is either a trail buffer 115 00:05:04,860 --> 00:05:09,040 of value of one, or it's not, a value of zero. 116 00:05:09,040 --> 00:05:12,870 The output here has a particularly useful interpretation. 117 00:05:12,870 --> 00:05:15,460 Not surprisingly, we see a minimum value of zero 118 00:05:15,460 --> 00:05:17,750 and a maximum value of one. 119 00:05:17,750 --> 00:05:21,460 The count represents the number of pixels in that zone 120 00:05:21,460 --> 00:05:25,140 and the sum is the sum of all values in that zone. 121 00:05:25,140 --> 00:05:29,070 If you divide the sum by the count, you get the mean. 122 00:05:29,070 --> 00:05:30,970 The mean value in this case represents 123 00:05:30,970 --> 00:05:34,083 the percent of the zone that is covered by that phenomenon. 124 00:05:35,070 --> 00:05:37,340 In this case, we interpret the first record in the table 125 00:05:37,340 --> 00:05:41,530 to mean that 58% of the town with FIPS six code 23040 126 00:05:43,140 --> 00:05:46,270 is covered by the 500 meter trail buffer. 127 00:05:46,270 --> 00:05:48,970 It's important to understand that this interpretation 128 00:05:48,970 --> 00:05:51,970 only works when you specify a raster dataset 129 00:05:51,970 --> 00:05:55,283 with values of zero and one for your input values. 130 00:05:57,540 --> 00:05:59,890 Let's take a look at a slightly different example 131 00:05:59,890 --> 00:06:02,860 with a more traditional continuous raster. 132 00:06:02,860 --> 00:06:05,003 In this case, a digital elevation model. 133 00:06:06,010 --> 00:06:08,690 We see the input data in the image on the left, 134 00:06:08,690 --> 00:06:11,770 which includes the town boundaries and a DEM with values 135 00:06:11,770 --> 00:06:16,003 that range between 548 and 4,064 feet. 136 00:06:16,910 --> 00:06:18,960 I set up the zonal statistics as table tool 137 00:06:18,960 --> 00:06:22,010 exactly like I did before, but the key difference being 138 00:06:22,010 --> 00:06:24,423 the input raster and the output table name. 139 00:06:25,600 --> 00:06:27,120 When we look at the output table, 140 00:06:27,120 --> 00:06:28,970 we notice that it's got all the same statistics 141 00:06:28,970 --> 00:06:33,850 we saw last time, the min, max, mean and so on 142 00:06:33,850 --> 00:06:36,850 but the way to interpret this is different. 143 00:06:36,850 --> 00:06:39,200 Let's review the top record in the table. 144 00:06:39,200 --> 00:06:40,920 The mean value of that top record 145 00:06:40,920 --> 00:06:44,220 is approximately 1,720 feet. 146 00:06:44,220 --> 00:06:47,330 That means the mean elevation in that particular town 147 00:06:47,330 --> 00:06:50,010 is 1,720 feet. 148 00:06:50,010 --> 00:06:52,170 We can't tell how much of the town is at 149 00:06:52,170 --> 00:06:54,290 or above that elevation. 150 00:06:54,290 --> 00:06:57,470 Obviously the output is still valid and useful, 151 00:06:57,470 --> 00:06:59,540 but it's important to understand that the interpretation 152 00:06:59,540 --> 00:07:03,003 of the outputs for these two different instances does vary. 153 00:07:04,450 --> 00:07:06,430 One other things to mention here is this warning 154 00:07:06,430 --> 00:07:09,640 you might receive when running a zonal operation. 155 00:07:09,640 --> 00:07:11,480 The warning states that some zones 156 00:07:11,480 --> 00:07:13,830 may not have been rasterized. 157 00:07:13,830 --> 00:07:16,110 That's okay, it simply means that one or more 158 00:07:16,110 --> 00:07:19,130 of the zone boundaries are so small or oddly shaped 159 00:07:19,130 --> 00:07:21,520 that they did not contain the center point 160 00:07:21,520 --> 00:07:24,840 of any of the pixels in your input raster dataset. 161 00:07:24,840 --> 00:07:27,470 As a result, that particular zone will be ignored 162 00:07:27,470 --> 00:07:28,653 in the calculations. 163 00:07:30,120 --> 00:07:33,083 Okay, now back to our regularly scheduled programming. 164 00:07:34,110 --> 00:07:36,190 Let's look at another case here. 165 00:07:36,190 --> 00:07:38,270 We use zonal statistics as table again, 166 00:07:38,270 --> 00:07:41,510 but this time we've got multiple buffers around the trails, 167 00:07:41,510 --> 00:07:44,907 one at 500 meters with a pixel value of 500 168 00:07:44,907 --> 00:07:49,490 and the other at 1000 meters with a pixel value of 1000. 169 00:07:49,490 --> 00:07:50,420 What do you think? 170 00:07:50,420 --> 00:07:51,870 Is this the right tool for the job 171 00:07:51,870 --> 00:07:54,200 if we're trying to compute the amount of each town 172 00:07:54,200 --> 00:07:56,100 that is comprised of each buffer type? 173 00:07:57,720 --> 00:07:59,870 Let's look at the output table. 174 00:07:59,870 --> 00:08:02,690 Once again, as expected, we see the same listing 175 00:08:02,690 --> 00:08:05,400 of statistical summary information, 176 00:08:05,400 --> 00:08:07,830 but what can we tell from the output? 177 00:08:07,830 --> 00:08:11,800 Remember the 500 meter buffer has a pixel value of 500 178 00:08:11,800 --> 00:08:15,570 and the 1000 meter buffer has a pixel value of 1000. 179 00:08:15,570 --> 00:08:20,010 What does a mean value of 574 actually mean here? 180 00:08:20,010 --> 00:08:21,290 Not much really. 181 00:08:21,290 --> 00:08:24,720 It's certainly not the mean buffer with in each town. 182 00:08:24,720 --> 00:08:26,370 If you've got some well-developed math skills, 183 00:08:26,370 --> 00:08:29,050 you could probably back your way into something meaningful. 184 00:08:29,050 --> 00:08:31,100 But for the rest of us, what's the point? 185 00:08:31,980 --> 00:08:34,660 My point here is simply to illustrate that this is not 186 00:08:34,660 --> 00:08:36,010 the right tool for the job. 187 00:08:37,430 --> 00:08:39,680 However, if you use raster calculator 188 00:08:39,680 --> 00:08:42,630 to query your raster inputs, to, for example, 189 00:08:42,630 --> 00:08:46,300 identify all pixels of the 1000 meter buffer 190 00:08:46,300 --> 00:08:49,123 or elevations above 1700 feet in the DEM, 191 00:08:50,570 --> 00:08:52,590 you produce datasets that meet the criteria 192 00:08:52,590 --> 00:08:55,500 of the special case raster that only contains values 193 00:08:55,500 --> 00:08:58,390 of either zero or one, where zero means the pixel 194 00:08:58,390 --> 00:09:01,610 did not meet the criteria 1000 meter buffer 195 00:09:01,610 --> 00:09:05,790 or 1700 foot elevation and one means that it did. 196 00:09:05,790 --> 00:09:07,610 We've already reviewed an approach to summarize 197 00:09:07,610 --> 00:09:10,260 this type of data so I won't go into that again here. 198 00:09:12,400 --> 00:09:15,410 Let's move on to the tabulate area geo processing tool 199 00:09:15,410 --> 00:09:17,133 to address the same questions. 200 00:09:18,740 --> 00:09:21,080 We've got our 500 meter trail buffers 201 00:09:21,080 --> 00:09:23,260 and town boundaries once again. 202 00:09:23,260 --> 00:09:25,980 I set up the tabulate areas geo processing tool 203 00:09:25,980 --> 00:09:27,780 as you see on the right. 204 00:09:27,780 --> 00:09:31,160 And, not surprisingly, I need to specify a zone field 205 00:09:31,160 --> 00:09:33,240 as well as the input raster. 206 00:09:33,240 --> 00:09:36,280 The tool produces a table as its output. 207 00:09:36,280 --> 00:09:39,440 The review of the output table reveals the FIPS six code 208 00:09:39,440 --> 00:09:43,690 as expected and the columns for values of zero and one. 209 00:09:43,690 --> 00:09:45,810 Of course, the zero and one represent the classes 210 00:09:45,810 --> 00:09:48,110 of the input raster where zero means the pixel 211 00:09:48,110 --> 00:09:51,190 is not a trail buffer and vice versa. 212 00:09:51,190 --> 00:09:53,120 Now it's important to note here that these numbers 213 00:09:53,120 --> 00:09:56,490 represent the area of each pixel class or value 214 00:09:56,490 --> 00:09:58,950 within that particular zone boundary. 215 00:09:58,950 --> 00:10:02,100 If we look at that top row, we see that the value one number 216 00:10:02,100 --> 00:10:05,390 for the FIPS six code 23040 217 00:10:05,390 --> 00:10:08,860 is about 54.5 million square meters. 218 00:10:08,860 --> 00:10:10,580 You can always review the layer properties 219 00:10:10,580 --> 00:10:13,340 to determine the linear and arrow units of the data 220 00:10:13,340 --> 00:10:14,340 you're working with. 221 00:10:15,450 --> 00:10:17,570 Okay, that was a simple example. 222 00:10:17,570 --> 00:10:19,660 Let's muddy the waters a bit. 223 00:10:19,660 --> 00:10:22,380 This time, we'll evaluate two trail buffers, 224 00:10:22,380 --> 00:10:25,970 one at 500 meters and the other 1000 meters. 225 00:10:25,970 --> 00:10:28,520 Of course, we also have the non buffer pixels 226 00:10:28,520 --> 00:10:29,593 with value zero. 227 00:10:30,560 --> 00:10:34,000 If I set up the tabulate area tool in the exact same way, 228 00:10:34,000 --> 00:10:35,850 what do you suppose will happen here? 229 00:10:37,770 --> 00:10:39,660 After I run the geo processing operation 230 00:10:39,660 --> 00:10:41,560 and open the resulting table, 231 00:10:41,560 --> 00:10:44,690 I see a format that looks very similar to the previous one 232 00:10:44,690 --> 00:10:47,450 with the addition of the 1000 meter buffer. 233 00:10:47,450 --> 00:10:52,450 That's great. One more quick aside here, 234 00:10:52,470 --> 00:10:54,200 in this example, I illustrate the use 235 00:10:54,200 --> 00:10:55,630 of the tabulate area tool 236 00:10:55,630 --> 00:10:57,850 on the National Land Cover Dataset. 237 00:10:57,850 --> 00:10:58,700 That's automatic raster 238 00:10:58,700 --> 00:11:02,340 with more than 25 pixel classes or values. 239 00:11:02,340 --> 00:11:05,010 I set the tool parameters just like we've seen before 240 00:11:05,010 --> 00:11:06,343 and run the operation. 241 00:11:07,360 --> 00:11:09,630 The output is as expected. 242 00:11:09,630 --> 00:11:12,120 I only show this example to point out that it doesn't matter 243 00:11:12,120 --> 00:11:14,000 how many different values are represented 244 00:11:14,000 --> 00:11:15,940 in your input raster. 245 00:11:15,940 --> 00:11:18,540 In the end, each value will have a corresponding column 246 00:11:18,540 --> 00:11:21,030 in the output table where the cell value represents 247 00:11:21,030 --> 00:11:24,323 the area of that pixel class within a given boundary. 248 00:11:26,090 --> 00:11:29,290 If we compare the outputs from the zonal statistics as table 249 00:11:29,290 --> 00:11:32,770 and tabulate area operations, we know similarities 250 00:11:32,770 --> 00:11:35,450 and differences in the two approaches. 251 00:11:35,450 --> 00:11:36,610 Of course, it doesn't make sense 252 00:11:36,610 --> 00:11:39,730 to calculate summary statistics on thematic values. 253 00:11:39,730 --> 00:11:41,740 They're not really numbers. 254 00:11:41,740 --> 00:11:45,420 The primary similarity here is the zone field definition. 255 00:11:45,420 --> 00:11:47,880 And we can see the FIPS six code throughout all four 256 00:11:47,880 --> 00:11:49,680 of the tables that are present here. 257 00:11:52,250 --> 00:11:54,490 Let's look at just two of the outputs, 258 00:11:54,490 --> 00:11:57,200 the two that dealt with a special class of input raster 259 00:11:57,200 --> 00:11:59,520 with values of zero and one. 260 00:11:59,520 --> 00:12:02,900 Can you see any similarities between the two tables? 261 00:12:02,900 --> 00:12:04,300 Look at the top row in each, 262 00:12:04,300 --> 00:12:06,410 they both refer to the same town, 263 00:12:06,410 --> 00:12:08,460 which you can tell by the FIPS six codes. 264 00:12:09,470 --> 00:12:13,010 If we multiply the area by the mean, 265 00:12:13,010 --> 00:12:16,363 I derive a value of 54.5 million square meters. 266 00:12:17,270 --> 00:12:19,560 If we look instead at the tabulate area result, 267 00:12:19,560 --> 00:12:22,260 we see almost exact same value. 268 00:12:22,260 --> 00:12:23,680 We wondered to the same result, 269 00:12:23,680 --> 00:12:25,780 along two very different paths, 270 00:12:25,780 --> 00:12:28,703 zonal statistics as table and tabulate area. 271 00:12:30,130 --> 00:12:32,630 So where do we go from here? 272 00:12:32,630 --> 00:12:35,860 Remember that a joint operation joining the resulting table 273 00:12:35,860 --> 00:12:39,190 from a zonal operation to the input zone data 274 00:12:39,190 --> 00:12:43,000 is a common following step after zonal calculation. 275 00:12:43,000 --> 00:12:47,170 After that, query the data, apply custom symbology, 276 00:12:47,170 --> 00:12:50,190 calculate attribute values, recreate new attributes 277 00:12:50,190 --> 00:12:52,080 based on your findings. 278 00:12:52,080 --> 00:12:54,160 That's it for Module Three Lectures. 279 00:12:54,160 --> 00:12:55,810 We covered a lot of ground this week. 280 00:12:55,810 --> 00:12:59,280 Hopefully some of it was review and some of it was new. 281 00:12:59,280 --> 00:13:01,550 If any of this remains a mystery to you, 282 00:13:01,550 --> 00:13:03,240 one, don't panic 283 00:13:03,240 --> 00:13:05,800 and two, post questions and comments to Yellowdig 284 00:13:05,800 --> 00:13:07,700 and let's keep the conversation going.