This data set was provided by Dr. Djafar Oladi who recently obtained his Ph.D. from the Faculty of Forestry and Environmental Management at the University of New Brunswick. Dr. Oladi is a member of the faculty of the University of Mazandaran, Iran, and took leave from that university to obtain his Ph. D. He collected the data and conducted analyses of it in the course of writing his thesis.
The problem with which he was concerned involved predicting the amount of crown cover in a forest plantation from indirect measurements made by means of satellite (remote sensing) digital images. In 1994 data were gathered which comprised ground measurements of crown cover and corresponding satellite observations. Measurements were made on 8 age classes of trees, the ages being 3, 5, ...., 17 years. The measurements were obtained from forest plantations having two different levels of quality, designated as good and poor.
For the good plantations, 10 regions were selected from each age class (from a single satellite image). Each region consisted of a group of 9 pixels, in a 3 by 3 array, giving a total of 90 pixels in each age class and hence a total of 720 observations. The pixels were aligned with a digital map so that the exact sub-region on the ground corresponding to each pixel could have its amount of crown cover measured directly. These subregions were of dimensions 30 metres by 30 metres. Sub-regions in which there was a significant amount of damage were discarded from the study, so that in the end there were only 481 observation points in the data from the good plantations. Damaged sub-regions included those which consisted in large part of, say, a section of railway track or road.
For the poor plantations, 5 regions (i.e. 3 by 3 arrays of pixels) were selected from each age class, and the pixels aligned as for the good ones. Again sub-regions in which there was substantial damage were discarded. Of the initial 360 subregions, only 276 remained after the damaged ones were discarded. In particular, no pixels at all remained in age class 3.
The 1994 data set as provided for this Case Study includes the variables:
cci
- the observed crown cover index. This index is equal to the total crown cover in each sub-region, divided by the area of the sub-region, i.e. by 900 square metres.
bnd1
, bnd2
, bnd3
, bnd4
, bnd5
, bnd7
- 6 of the 7 bands of remote sensing values, Band 6 was discarded and was not available to us.
quality
(good or poor)
and the following three auxiliary variables:
age
- the age cohort
ht
- the mean height of trees in the sub-region
dbh
- diameter at breast height. Again a mean value over each sub-region. A large fraction of the observations of dbh are missing.
All missing values in the data sets are coded as NA
.
Note that the purpose of the project was to be able to monitor change in crown cover without the necessity of making expensive ground measurements. Therefore the auxiliary variables should not be used in the final model, since they would not be available in the real situation in which it is wished to apply the model.
One of Dr. Oladis concerns was to validate the model, i.e. to try to make sure that the model continues, over time, to predict crown cover accurately. In order to attempt such validation, another set of observations was collected in 1995. These observations were made on the same age cohorts, i.e. on trees in the age classes 4, 6, made available to us. The 1995 data comprised observations of two distinct types. The first type came from regions in good plantations. Here 4 regions (36 pixels) were observed in each age class. None of the corresponding subregions was damaged so data of this type comprises the full 288 observations.
The second type consists of observations on damaged sub-regions only. Regions were again selected in groups of 9 pixels (3 by 3 arrays), and then ONLY the pixels corresponding to damaged sub-regions were retained. There were a total of 127 observations of data of this type. Note that the second type of observation (damaged) is very different in nature from the second component (poor) of the 1994 data.
The 1995 data set as provided for this Case Study includes the variables ht
, dbh
, bnd5
, cci
and type
(undamaged or damaged).
Note that for the 1995 data, only band 5 is available. Dr. Oladi concluded that band 5 was the only relevant band. Since the process of digitizing satellite photos is expensive and time consuming, Dr. Oladi therefore had the 1995 data digitized over band 5 only. (Question: Was Dr. Oladi correct in his conclusion that band 5 is the only relevant band?)
Dr. Oladis concern was to use a model fitted to the good 1994 data to calculate crown cover for the undamaged 1995 data so that the predicted values could be compared with the actual observed values. The question arises: Is the model indeed valid? Does it predict the results for 1995 as it ought? This leads to questions of how one conducts formal statistical tests for the validity of such a model.
Other questions to be addressed should include just how to model the phenomenon at hand. What are the problems and questionable aspects of the procedure? Are there any lurking perils? What role if any should the poor data from 1994 and the damaged data from 1995 play?