[12 marks]
Use R to re-draw Figs. 8-11, 8-15 and 9-4 from the text.
> xgr <- seq(-4,4,length=50) > plot(xgr, dnorm(xgr), type = "l", lty = 1, xlab = "x", ylab ="f(x)") > lines(xgr,dt(xgr,10),lty=2) > lines(xgr,dt(xgr,1),lty=3) > legend(1.8,.38,c("infinite df","10 df","1 df"),lty=1:3) > title("t density") > xgr <- seq(0,30,length=50) > plot(xgr, dchisq(xgr, 2), type = "l", lty = 1, xlab = "x", ylab ="f(x)") > lines(xgr,dchisq(xgr,5),lty=2) > lines(xgr,dchisq(xgr,10),lty=3) > legend(15,.4,c("2 df","5 df","10 df"),lty=1:3) > title("Chi-square density") > xgr <- seq(0,8,length=90) > plot(xgr, df(xgr,5,15), type = "l", lty = 1, xlab = "x", ylab ="f(x)") > lines(xgr,df(xgr,5,5),lty=3) > legend(3,.6,c("F(5,15)","F(5,5)"),lty=c(1,3)) > title("F density")
> soil sur sub diff 1 6.55 6.78 -0.23 2 5.98 6.14 -0.16 3 5.59 5.80 -0.21 4 6.17 5.91 0.26 5 5.92 6.10 -0.18 6 6.18 6.01 0.17 7 6.43 6.18 0.25 8 5.68 5.88 -0.20 > plot(soil$sur,soil$sub,pch=19) > fitsoil <- lm(sub~sur,data=soil) > abline(fitsoil) > summary(fitsoil) Call: lm(formula = sub ~ sur, data = soil) Residuals: Min 1Q Median 3Q Max -0.26473 -0.17263 0.03719 0.09778 0.34110 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 1.8854 1.4416 1.308 0.2388 sur 0.6952 0.2375 2.927 0.0264 * --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 0.2117 on 6 degrees of freedom Multiple R-Squared: 0.5882, Adjusted R-squared: 0.5196 F-statistic: 8.57 on 1 and 6 DF, p-value: 0.02637 > anova(fitsoil) Analysis of Variance Table Response: sub Df Sum Sq Mean Sq F value Pr(>F) sur 1 0.38409 0.38409 8.57 0.02637 * Residuals 6 0.26891 0.04482 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
There is some evidence (P = 0.026) that the subsoil pH varies with surface pH. Surface pH could be used to predict subsoil pH but R-squared = 0.59 is not large so the prediction would be subject to error.
> hist(soil$diff,bor="red",col="green") > t0 <- (mean(soil$diff)-0)/sqrt(var(soil$diff)/length(soil$diff)) > t0 [1] -0.4793321 > 2*pt(t0,length(soil$diff)-1) [1] 0.6463151
95% confidence interval for the mean difference
> mean(soil$diff)+c(-1,1)*qt(0.975,7)*sqrt(var(soil$diff)/8) [1] -0.2224937 0.1474937
There is no evidence (P = 0.65) that the mean subsoil pH differs from the mean surface pH at any given location.
A dot plot, box plot or histogram would also be good.
> stem(soil$diff) The decimal point is 1 digit(s) to the left of the | -2 | 310 -1 | 86 -0 | 0 | 1 | 7 2 | 56
Possible sources of variation: type of soil; whether soil has been cultivated recently; whether lime has been applied to the surface recently; whether fertilizer has been applied recently; soil moisture; sampling tool; contamination of subsoil by surface soil when sampling; analytical techniqes; quality assurance at the analysis laboratory; etc.
Using the variance of the differences from (a), we will need 209 observations.
> var(soil$dif) [1] 0.04896429 > var(soil$diff)*(qnorm(.975)/.03)^2 [1] 208.9937
> marg poly brand 1 14.1 I 2 13.6 I 3 14.4 I 4 14.3 I 5 12.8 P 6 12.5 P 7 13.4 P 8 13.0 P 9 12.3 P 10 16.8 M 11 17.2 M 12 16.4 M 13 17.3 M 14 18.0 M 15 18.1 F 16 17.2 F 17 18.7 F 18 18.4 F > boxplot(split(marg$poly,marg$brand)[c("I","P","M","F")], xlab="brand",ylab="PAPFUA (%)") Note that a multiple dot plot would also be acceptable. > anova(lm(poly~brand, data=marg)) Analysis of Variance Table Response: poly Df Sum Sq Mean Sq F value Pr(>F) brand 3 84.764 28.255 103.77 8.428e-10 *** Residuals 14 3.812 0.272 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> anova(lm(poly~brand, data=marg))$"Mean Sq"[2] 0.2722857 > anova(lm(poly~brand, data=marg))$"Mean Sq"[2]/c(qchisq(.975,14)/14,qchisq(.025,14)/14) [1] 0.1459477 0.6772403
There is strong evidence (P << 0.01) that the percentage of physiologically active polyunsaturated fatty acids varies between brands.
> steel$ratef <- as.factor(steel$rate) > steel rate loss ratef 1 14 280 14 2 18 350 18 3 18 330 18 4 40 470 40 5 43 500 43 6 43 490 43 7 43 470 43 8 45 560 45 9 112 1200 112
> fitsteel1 <- lm(loss~rate,data=steel) > summary(fitsteel1) Call: lm(formula = loss ~ rate, data = steel) Residuals: Min 1Q Median 3Q Max -57.92 -30.30 13.67 32.22 52.22 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 132.0732 25.2162 5.238 0.00120 ** rate 9.2057 0.5037 18.278 3.63e-07 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: 41.69 on 7 degrees of freedom Multiple R-Squared: 0.9795, Adjusted R-squared: 0.9765 F-statistic: 334.1 on 1 and 7 DF, p-value: 3.633e-07 > anova(fitsteel1) Analysis of Variance Table Response: loss Df Sum Sq Mean Sq F value Pr(>F) rate 1 580634 580634 334.08 3.633e-07 *** Residuals 7 12166 1738 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> fitsteel2 <- lm(loss~ratef,data=steel) > anova(fitsteel2) Analysis of Variance Table Response: loss Df Sum Sq Mean Sq F value Pr(>F) ratef 5 592133 118427 532.92 0.0001279 *** Residuals 3 667 222 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
> fitsteel <- lm(loss~rate+ratef,data=steel) > anova(fitsteel) Analysis of Variance Table Response: loss Df Sum Sq Mean Sq F value Pr(>F) rate 1 580634 580634 2612.852 1.649e-05 *** ratef 4 11499 2875 12.937 0.03101 * Residuals 3 667 222 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 95% Confidence Interval for the Residual Variance (either of the following is acceptable)
The point estimate is 222.2, the 95% confidence interval is (71.3, 3089).
> anova(fitsteel)$"Mean Sq"[3] 222.2222 > anova(fitsteel)$"Mean Sq"[3]/c(qchisq(.975,3)/3, qchisq(.025,3)/3) [1] 71.31342 3089.34773
The point estimate is 1738, the 95% confidence interval is (760, 7199).
> anova(fitsteel1)$"Mean Sq"[2] 1738.015 > anova(fitsteel1)$"Mean Sq"[2]/c(qchisq(.975,7)/7, qchisq(.025,7)/7) [1] 759.7755 7199.4370 > plot(steel$rate,steel$loss,pch=19) > abline(fitsteel1)
There is evidence (P = 0.03) that the mean weight loss varies non-linearly with SO2 deposition rate. We could stop there. However, the line looks straight enough on the graph and R2 = 0.98 is quite large, so the line should be usable for linear interpolation. This justifies ignoring the lack of fit and we can test the slope, which is highly significant (P << 0.01).
If we don't assume the linear relationship, we can't make a prediction. However, the scatter of points about the fitted line is quite tight, so a linear prediction should be valid. While rate = 43 is close to the observed data and so the prediction of 528 g/m2 should be reliable, rate = 80 is in a region where no observations were taken so the prediction of 869 g/m2 is less reliable.
> predict(fitsteel1,data.frame(rate=c(43,80))) 1 2 527.9181 868.5288
13-9 (p. 640).
Anneal temperature/time is coded as 1 = 900/60, 2 = 900/180, 3 =
950/60, 4 = 1000/15, 5 = 1000/30.
Polysilicon doping is coded 1 = 1*1020, 2 =
2*1020.
> fittrans <- lm(curr~doping*anneal, data=transistor) > anova(fittrans) Analysis of Variance Table Response: curr Df Sum Sq Mean Sq F value Pr(>F) doping 1 1.442 1.442 25.2313 0.0005194 *** anneal 4 124.238 31.059 543.5196 1.203e-11 *** doping:anneal 4 0.809 0.202 3.5405 0.0477319 * Residuals 10 0.571 0.057 --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1
The interaction is (just) significant at the 5% level, so we conclude that both polysilicon doping and anneal time/temperature affect base current. Since the interaction is not highly significant, we should also check the main effects; both polysilicon doping and anneal time/temperature are highly significant.
> interactplot <- function (y, facta, factb, xlab = deparse(substitute(factb)), ylab = deparse(substitute(y)), main = paste("Interaction plot by", deparse(substitute(facta))), ...) { values <- sapply(split(y, facta:factb), mean) matplot(matrix(values, ncol = nlevels(facta)), type = "l", xlab = xlab, ylab = ylab, ...) title(main = main) legend(1, max(values), levels(facta), lty = 1:nlevels(facta), col = 1:nlevels(facta)) invisible() } > attach(transistor) > interactplot(curr,doping,anneal,ylab="current") > interactplot(curr,anneal,doping,ylab="current")
Since there are 20 residuals but only 10 degrees of freedom, there are strong dependencies between the residuals and these show up as symmetries in the plots. In particular, the plot of residuals versus fitted values shows the positive residuals mirrored by negative residuals and the negative quadrant of the normal QQ plot mirrors the positive quadrant. Nonetheless, the scatter is reasonably random and the normal QQ plot is reasonably straight, suggesting that the model is adequate and the assumptions of the model are reasonably well satisfied.
> plot(fittrans)