S3N03 Assignment #3

Statistics 3N03 - Assignment #3 Solutions

2003-12-02

Due: 2003-12-02 18:00

Use R to do the graphics on this assignment. Do the ANOVA calculations in R and with your calculator, and submit both. The text references are to Montgomery & Runger, Applied Statistics and Probability for Engineers, 3rd edition.

Full marks = 125

Question 1

Figure 8-4 [Graph / 3]

> xgr <- seq(-4,4,length=50)
> plot(xgr, dnorm(xgr), type = "l", lty = 1, xlab = "x", ylab ="f(x)")
> lines(xgr,dt(xgr,10),lty=2)
> lines(xgr,dt(xgr,1),lty=3)
> legend(1.8,.38,c("infinite df","10 df","1 df"),lty=1:3)
> title("t density")

Figure 8-8 [Graph / 3]

> xgr <- seq(0,30,length=50)
> plot(xgr, dchisq(xgr, 2), type = "l", lty = 1, xlab = "x", ylab ="f(x)")
> lines(xgr,dchisq(xgr,5),lty=2)
> lines(xgr,dchisq(xgr,10),lty=3)
> legend(15,.4,c("2 df","5 df","10 df"),lty=1:3)
> title("Chi-square density")

Figure 10-4 [Graph / 3]

> xgr <- seq(0,8,length=90)
> plot(xgr, df(xgr,5,15), type = "l", lty = 1, xlab = "x", ylab ="f(x)")
> lines(xgr,df(xgr,5,5),lty=3)
> legend(3,.6,c("F(5,15)","F(5,5)"),lty=c(1,3))
> title("F density")

Question 2

(a) [CI / 3, sample size / 3]

The confidence interval is (29.48, 47.92).

> qt(.975,8)*12/sqrt(9)
[1] 9.224016
> 38.7 + c(-1,1)*qt(.975,8)*12/sqrt(9)
[1] 29.47598 47.92402

To solve for the sample size, we have to solve qt(.975, n-1)*12/sqrt(n) = 2 to get n. This is complicated because we have to know n to get qt(.975, n-1), but we know that if n is large qt(.975, n-1) will be close to 2 and this gives the solution n = 144. (If the solution were not an integer, we would round up to the next integer.)

> (2*12/2)^2
[1] 144

(b) [Calculation / 6]

When the process mean is 100 g, the probability that a given pellet is within limits is 0.8904, and this becomes 0.7799 when the mean shifts to 102 g. Compute the binomial probability of getting exactly 1 out of 4 pellets within the limits for the normal process and for the shifted process, then use Bayes' Theorem to compute the probability that the process mean has shifted, given that 1 out of 4 was within the limits. The answer is 44.06%.

> pr1 <- pnorm(104,100,2.5)-pnorm(96,100,2.5)
> pr1
[1] 0.8904014
> pr2 <- pnorm(104,102,2.5)-pnorm(96,102,2.5)
> pr2
[1] 0.779947
> dbinom(1,4,pr1)
[1] 0.004688789
> dbinom(1,4,pr2)
[1] 0.03324349
> dbinom(1,4,pr2)*0.1/(dbinom(1,4,pr2)*0.1 + dbinom(1,4,pr1)*0.9)
[1] 0.4406462

Question 3

(a) [Choice of either paired t-test or regression / 3, graph / 2, analysis / 4, assumptions / 2, conclusions / 1, hand calculation verified by R / 2]

Either a paired t-test or a simple linear regression is a valid analysis; a two-sample t-test is NOT valid.

> chlorine
    msi   sib  diff
1  0.39  0.36 -0.03
2  0.84  1.35  0.51
3  1.76  2.56  0.80
4  3.35  3.92  0.57
5  4.69  5.35  0.66
6  7.70  8.33  0.63
7 10.52 10.70  0.18
8 10.92 10.91 -0.01

Paired t-test

> t.test(chlorine$sib,chlorine$msi,pair=T)

        Paired t-test

data:  chlorine$sib and chlorine$msi 
t = 3.6454, df = 7, p-value = 0.008228
alternative hypothesis: true difference in means is not equal to 0 
95 percent confidence interval:
 0.1453686 0.6821314 
sample estimates:
mean of the differences 
                0.41375 

> stem(chlorine$diff)

  The decimal point is 1 digit(s) to the left of the |

  -0 | 31
   0 | 8
   2 | 
   4 | 17
   6 | 36
   8 | 0

Assumptions:

Normality (see graph)
Independence (can't test).

Conclusions:

There is evidence (P = 0.008) that mean MSI and mean SIB are not equal.

Simple Linear Regression

> fitchlor<-lm(sib~msi,chlorine)
> summary(fitchlor)

Call:
lm(formula = sib ~ msi, data = chlorine)

Residuals:
     Min       1Q   Median       3Q      Max 
-0.56096 -0.13956  0.05219  0.24941  0.30371 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  0.54083    0.18705   2.891   0.0276 *  
msi          0.97469    0.02928  33.283  4.9e-08 ***
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

Residual standard error: 0.327 on 6 degrees of freedom
Multiple R-Squared: 0.9946,     Adjusted R-squared: 0.9937 
F-statistic:  1108 on 1 and 6 DF,  p-value: 4.895e-08 

> plot(sib~msi, data=chlorine)
> abline(fitchlor)

Assumptions:

Normality (residuals look OK)
Homoscedasticity (residuals look OK)
Independence (can't test)
Linearity (looks OK)

Conclusions:

There is strong evidence of a linear relationship between MSI and SIB, the slope is very close to 1 but the intercept is not zero.

(b) [Choice of single factor ANOVA / 2, graph / 2, analysis / 4, assumptions / 2, conclusions / 1, hand calculation verified by R / 3]

The correct analysis is as a single factor ANOVA; the missing value can be omitted, leaving the design unbalanced.

> alcohol
    conc lab
1  85.06   1
2  85.25   1
3  84.87   1
4  84.99   2
5  84.28   2
6  84.48   3
7  84.72   3
8  85.10   3
9  84.10   4
10 84.55   4
11 84.05   4

> anova(lm(conc~as.factor(lab), data=alcohol))
Analysis of Variance Table

Response: conc
          Df  Sum Sq Mean Sq F value Pr(>F)  
lab        3 1.05823 0.35274  3.6778 0.0708 .
Residuals  7 0.67138 0.09591                 
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

> boxplot(conc~lab, data=alcohol, ylab="alcohol concentration", xlab="Laboratory")

Assumptions:

Normality (see plot, seems OK, samples too small to test)
Independence (no way to test, check experimental conditions)
Homoscedasticity (see plot, seems OK, samples too small to test)

Conclusions:

There is some evidence (P = 0.07) that the 4 labs do not all give the same mean % alcohol determination.

Question 4

(a) Analysis [Fit & plot / 4, ANOVA / 8, CI / 3, assumptions / 3, conclusions / 2, hand calculation verified by R / 4]

Note how the residual mean square can be extracted as entry [3,3] of the ANOVA table, and upper and lower 95% confidence limits (0.0571, 0.6673) computed in a single statement. The "as.factor(age)" line in the ANOVA table is the lack of fit test, it comes from adding age as a factor to the model after age as linear regressor.

> pipecrack
  age  load
1  20 11.45
2  20 10.42
3  20 11.14
4  25 10.84
5  25 11.17
6  25 10.54
7  31  9.47
8  31  9.19
9  31  9.54

> plot(load~age, data=pipecrack)
> anova(lm(load~age+as.factor(age), data=pipecrack))
Analysis of Variance Table

Response: load
               Df Sum Sq Mean Sq F value   Pr(>F)   
age             1 4.0362  4.0362 29.3306 0.001639 **
as.factor(age)  1 0.6605  0.6605  4.7996 0.070997 . 
Residuals       6 0.8257  0.1376                    
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
> mseq3<-anova(lm(load~age+as.factor(age),pipecrack))[3,3]
> mseq3
          
0.1376111 
> 6*mseq3/c(qchisq(.975,6),qchisq(.025,6))
[1] 0.05714203 0.66728937
> abline(lm(load~age, data=pipecrack))

Since the lack of fit test did not reject the hypothesis of linearity, an alternative analysis without the lack of fit sum of squares is valid; this gives 7 df for the residual mean square, and a different confidence interval for the residual variance (0.0928, 0.8794) based on 7 df.

> anova(lm(load~age, data=pipecrack))
Analysis of Variance Table

Response: load
          Df Sum Sq Mean Sq F value   Pr(>F)   
age        1 4.0362  4.0362  19.011 0.003313 **
Residuals  7 1.4861  0.2123                    
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

> 7*anova(lm(load~age,pipecrack))[2,3]/c(qchisq(.975,7),qchisq(.025,7))
[1] 0.0928099 0.8794427
>
> coef(lm(load~age,pipecrack))
(Intercept)         age 
 14.1904029  -0.1489194 
> summary(lm(load~age,pipecrack))

Call:
lm(formula = load ~ age, data = pipecrack)

Residuals:
    Min      1Q  Median      3Q     Max 
-0.7920 -0.1039 -0.0339  0.2380  0.7026 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 14.19040    0.87877   16.15  8.5e-07 ***
age         -0.14892    0.03415   -4.36  0.00331 ** 
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 

Residual standard error: 0.4608 on 7 degrees of freedom
Multiple R-Squared: 0.7309,     Adjusted R-squared: 0.6924 
F-statistic: 19.01 on 1 and 7 DF,  p-value: 0.003313

Assumptions:

Normal (Should look at residuals, but OK by plot)
Homoscedastic (Should look at residuals, but OK by plot)
Independent (Can't test. Look at the conduct of the experiment)
Linear (OK by lack of fit test).

Conclusions:

Some evidence of nonlinearity (P = 0.07) but the hypothesis of linearity over the range of ages studied is acceptable at the 5% level. There is evidence (P < 0.01) that load at first crack decreases with age of pipe.

(b) Prediction [Calculation /2, discussion /2]

Note how in R the new age values to be used for the prediction have to be supplied in a data frame.

> predict(lm(load~age, pipecrack), newdata=data.frame(age=c(22,100)))
         1          2 
10.9141758 -0.7015385

The prediction at 22 days is an interpolation so it might be OK, except that the true relationship may not be exactly linear. The prediction at 100 days is an extreme extrapolation which gives an impossible negative value, so it is not reliable and should not be used.

Question 5

Analysis [ANOVA / 8, either plot / 4, CI / 4, conclusions / 4, hand calculation verified by R / 5]

The analysis is as a two-factor design with interaction. The 95% CI for the residual variance is (3.101, 21.85) on 9 df.

> copperex
   warp temp pct
1    17   50  40
2    20   50  40
3    16   50  60
4    21   50  60
5    24   50  80
6    22   50  80
9    12   75  40
10    9   75  40
11   18   75  60
12   13   75  60
13   17   75  80
14   12   75  80
25   21  125  40
26   17  125  40
27   23  125  60
28   21  125  60
29   23  125  80
30   22  125  80
> copperex$temp <- as.factor(copperex$temp)
> copperex$pct <- as.factor(copperex$pct)
> interactplot(copperex$warp, copperex$pct, copperex$temp)
> interactplot(copperex$warp, copperex$temp, copperex$pct)

> anova(lm(warp~pct*temp, data=copperex))
Analysis of Variance Table

Response: warp
          Df  Sum Sq Mean Sq F value   Pr(>F)   
pct        2  49.778  24.889  3.7966 0.063739 . 
temp       2 204.778 102.389 15.6186 0.001184 **
pct:temp   4  19.556   4.889  0.7458 0.584683   
Residuals  9  59.000   6.556                    
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1

> anova(lm(warp~pct*temp,copperex))[4,3]
         
6.555556 
> 9*anova(lm(warp~pct*temp,copperex))[4,3]/c(qchisq(.975,9),qchisq(.025,9))
[1]  3.101547 21.848700

Assumptions:

Normality (look at residuals)
Homoscedasticity (look at residuals
Independence (can't test).

Conclusions:

There is no evidence (P = 0.585) from these data that % copper and temperature interact. There is strong evidence (P = 0.001) that temperature affects warping, but less evidence (P = 0.064) that % copper affects warping, over the range of % copper studied.

Question 6

Analysis [Untransformed and log-transformed ANOVA / 10, residual plots / 4, conclusions / 4, hand calculation verified by R / 5]

The analysis is as a two-factor design with interaction. The interaction is significant at the 5% level, and is in fact highly significant, so there is no need to test the main effects separately. Cyclic loading frequency and environmental conditions both affect fatigue crack growth, and the effect of loading frequency is different under different environments. The ANOVA tables from untransformed and log-transformed data are almost the same.

I have shown two of the diagnostic plots, "Residuals vs. Fitted" and a Normal probability plot of the residuals. You are not expected to do the residual plots by hand. The log-transformed data are more homoscedastic and more normal than the untransformed data, and one outlier (observation #31) is pulled in closer to the rest of the data, but that does not seem to affect the analysis very much.

(In Assignment #1 you did a simple graphical analysis of these data; does this more sophisticated analysis tell us anything more than the graphical analysis?)

> fitcrack <- lm(growth~freqf*envir, data=crack)
> anova(fitcrack)
Analysis of Variance Table

Response: growth
            Df  Sum Sq Mean Sq F value    Pr(>F)    
freqf        2 209.893 104.946  522.40 < 2.2e-16 ***
envir        2  64.252  32.126  159.92 1.076e-15 ***
freqf:envir  4 101.966  25.491  126.89 < 2.2e-16 ***
Residuals   27   5.424   0.201                      
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
> plot(fitcrack)

> fitcrackl <- lm(log(growth)~freqf*envir, data=crack)
> anova(fitcrackl)
Analysis of Variance Table

Response: log(growth)
            Df Sum Sq Mean Sq F value    Pr(>F)    
freqf        2 7.5702  3.7851 404.095 < 2.2e-16 ***
envir        2 2.3576  1.1788 125.849 2.061e-14 ***
freqf:envir  4 3.5284  0.8821  94.172 1.885e-15 ***
Residuals   27 0.2529  0.0094                      
---
Signif. codes:  0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 
> plot(fitcrackl)

Statistics 3N03 - Assignment #3 Solutions

2003-12-02

Due: 2003-12-02 18:00

Full marks = 125

Question 1

Figure 8-4 [Graph / 3]

Figure 8-8 [Graph / 3]

Figure 10-4 [Graph / 3]

Question 2

(a) [CI / 3, sample size / 3]

(b) [Calculation / 6]

Question 3

(a) [Choice of either paired t-test or regression / 3, graph / 2, analysis / 4, assumptions / 2, conclusions / 1, hand calculation verified by R / 2]

Paired t-test

Simple Linear Regression

(b) [Choice of single factor ANOVA / 2, graph / 2, analysis / 4, assumptions / 2, conclusions / 1, hand calculation verified by R / 3]

Question 4

(a) Analysis [Fit & plot / 4, ANOVA / 8, CI / 3, assumptions / 3, conclusions / 2, hand calculation verified by R / 4]

Assumptions:

Conclusions:

(b) Prediction [Calculation /2, discussion /2]

Question 5

Analysis [ANOVA / 8, either plot / 4, CI / 4, conclusions / 4, hand calculation verified by R / 5]

Assumptions:

Conclusions:

Question 6

Analysis [Untransformed and log-transformed ANOVA / 10, residual plots / 4, conclusions / 4, hand calculation verified by R / 5]

Statistics 3N03/3J04