Hint: Review the notes on making comparative box plots. Remember that when you import data into an R data frame, all observations of the same quantity (in this case the measured speed of light in km/sec minus 299000) should go in one column, and any categorical variables (in this case, the trial number) should go in their own columns beside it. Categorical variables should be entered as factors so they don't get treated as numbers later. Here is a simple way to generate the column of indicators for
trial
; assume that you have called the data framemichelson
.
michelson$trial <- factor(rep(1:5, rep(20, 5)))
R doesn't have a digidot plot, so do stem-and-leaf and time series plots separately. Also do a lag-1 scatter plot. Is there evidence of trend, a shift in mean or autocorrelation?
Do an exploratory data analysis of the data in Table 12-14 (p 451).
Question 4: 12-64 (p 463)
Do graphical analyses using a scatterplot matrix to determine which variables affect thrust. Use different plotting symbols and colours for points with low ambient temperature. State your conclusions. (The question asks for linear regression and tests of hypothesis but of course you are not expected to do those for this assignment!)
Warning: The data are in Table 12-20 (p 456); the textbook has the data for 12-64 and 12-66 interchanged. You can get the file for Table 12-20 on the CD if you go to 12-66, but it is missing the "fuel flow rate". To simulate the real-life frustrations of working with data, I should have left you to discover this for yourself!
Hint: If the data are in a data frame called
jet
in R, thenpairs(jet)
will give a scatterplot matrix. If ambient temperature is in a column calledambtemp
, what doespch = 1 + (jet$ambtemp < 90)
do if you add it to thepairs
call? How does it work?
Question 5: 14-8 (p 520)
Do graphical analyses using comparative box plots to compare crack growth between the loading frequencies and between the environment conditions. Give "interaction plots" like the one in Figure 14-8 (p 516): plot the mean crack growth against environment condition, separately for each loading frequency, and plot the mean crack growth against loading frequency, separately for each environment condition. Repeat the graphs with crack growth on a log scale. State your conclusions. (The question asks for a two-factor analysis of variance but you will do that in Assignment #3.)
Hints: Enter the data as three columns in a data frame, putting the crack growth in the first column, a code for loading frequency in the second, and a code for environment condition in the third. To plot on a log scale, you can add the option
log="y"
to theplot
orboxplot
command to transform the Y-axis, or you can compute a new column of log-transformed crack growths. Will the boxplots look the same either way? Will the interaction plots look the same either way?
Question 6: Environmental Data
The following air quality measurements were taken downwind of a coal burning electrical generating station. Construct and interpret the following plots:
- box plot for sulphur concentration versus time of day;
- box plot for sulphur concentration versus date;
- time series plot.
Based on the plots you have constructed, do you think it is important to consider the time of day in an air-quality sampling regime? In your opinion, do the data provided tell the whole story regarding sulphur contamination in the air at this location, or do more data need to be collected? Justify your answer, and discuss what other factors might be considered.
Sulphur Concentration (ppm)
12:00am
6:00am
12:00pm
6:00pm
July 4 1990
22
34
35
43
July 5 1990
30
18
9
4
July 6 1990
18
27
23
11
July 7 1990
17
13
10
11
July 8 1990
23
21
16
9
July 9 1990
16
8
9
14
July 10 1990
15
3
2
1