S2MA3 Exercise #4

Statistics 2MA3 - Exercise #4

Updated 2001-03-31 15:40

Do these exercises by hand with a pocket calculator (where feasible), then check your work with R.
These problems and solutions cover all topics to the end of the course.

One-Sample Tests and Confidence Intervals

Use the Hospital Stay data in Table 2.11 on p. 39 of Rosner. (HOSPITAL.DAT, HOSPITAL.DOC on the data disk.) Plot a histogram of duration of stay. Compute the mean duration of stay. Assuming that the standard deviation of duration of stay is 4 days, find a 2-sided 95% confidence interval for the mean duration of stay and find a p-value to test the hypothesis that the mean length of stay is 5 days against the alternative that it is longer than that. Test the hypothesis that the standard deviation is 4 days, with a 2-sided 5% test. Without assuming a value for standard deviation, find a 2-sided 95% confidence interval for the mean duration of stay and find a p -value to test the hypothesis that the mean length of stay is 5 days against the alternative that it is longer than that. State your conclusions. Which of the above calculations may be invalid?
The mean concentration of a solution is supposed to be 60%. You suspect that it might be off by as much as 2% in either direction. From past experience, you know that your analytical method has a standard deviation of 2.6%. How many observations would be required to test this hypothesis at the 5% level, and ensure that the Type II Error Rate is no more than 1%?
The percentage of voters supporting a given political party was 60% at the last poll. You suspect that this might have changed by as much as 2% in either direction. How many voters should you sample to test this hypothesis at the 5% level, and ensure that the Type II Error Rate is no more than 1%? What would you do if the recommended number is larger than you can afford? If you sampled 3000 voters and 1623 were supporters of that party, give a 2-sided 95% confidence interval for percent support.
How many independent normal observations would be required to ensure that the upper limit of a 99% confidence interval for the variance is no more than 5 times the lower limit?

Two-Sample Tests and Confidence Intervals

Using the nutrition data described in Table 2.16 on p. 42 of Rosner (VALID.DAT, VALID.DOC on the data disk), compute a p-value to test the hypothesis that the mean alcohol consumption from the food frequency questionnaire is the same as from the diet record. Do what you can to test any assumptions you make. State your conclusions.
Repeat the previous analysis using the sign test (Sect. 9.2, p. 333) to test the hypothesis that the median difference between the two measures is zero. State your assumptions and your conclusions.
Do the Microbiology example, problems 8.148-8.152 on p. 326 of Rosner. Give an appropriate graphical display of the data. State any assumptions you make and test any assumptions you can test.

One-Factor Analysis of Variance

You have analyzed the Microbiology example, problems 8.148-8.152 on p. 326 of Rosner, as a two-sample t-test. Repeat the analysis, this time as an analysis of variance for a one-factor design. Show that the F statistic in the anova table is the square of the two-sample t statistic and has the same p-value. Show that the mean squared error (also called the mean squared residual or residual variance) is the same as the pooled variance estimate in the t-test. The graphical displays, assumptions and conclusions are exactly the same for both analyses.
Give a 95% confidence interval for the conditional variance of pod weight, given treatment, that is, the residual mean squared error after fitting treatment as a factor.
Analyze the Obstetrics data on p. 569 of Rosner. Answer problem 12.14 with a comparative box plot and an ANOVA table. Give a 95% confidence interval for the residual variance s².

Two-Factor Analysis of Variance

To see if the coverage of light blue latex interior paint depends on either the brand of paint or the brand of roller used, two gallons of each of three brands of paint were applied using each of two brands of roller. Present your results in a two-factor ANOVA table. State your assumptions and your conclusions. Give a 95% confidence interval for the residual variance s².

Paint brand:

1

1

1

1

2

2

2

2

3

3

3

3

Roller brand:

1

1

2

2

1

1

2

2

1

1

2

2

Coverage (ft²):

454

460

446

440

446

445

444

449

439

432

442

443

Simple Linear Regression

Use the Hospital Stay data in Table 2.11 on p. 39 of Rosner. (HOSPITAL.DAT, HOSPITAL.DOC on the data disk.) Plot duration of stay (dependent variable) against age (independent variable). Fit a straight line to the data and add it to the graph. Summarize the fit in an ANOVA table and state your assumptions and conclusions.
Use the Hospital Stay data in Table 2.11 on p. 39 of Rosner. (HOSPITAL.DAT, HOSPITAL.DOC on the data disk.) Plot duration of stay (dependent variable) against first temperature following admission (independent variable). Fit a straight line to the data and add it to the graph. Summarize the fit in an ANOVA table and state your assumptions and conclusions.

Multiple Regression

Use the Hospital Stay data in Table 2.11 on p. 39 of Rosner. (HOSPITAL.DAT, HOSPITAL.DOC on the data disk.) Give a pairs plot and a correlation matrix for the following variables: duration of stay, age, first temperature and first white blood cell count. Fit the model duration ~ age+temp1+wbc1. Summarize the fit in an ANOVA table. Plot the observed values against the fitted values and add a diagonal line to the plot. Plot the residuals against the fitted values. State your assumptions and conclusions.
Continuing with the Hospital Stay data, fit the model duration ~ temp1+age+wbc1 and give the ANOVA table. Discuss how it differs from the previous fit.

Simple Linear Regression with a Test for Lack of Fit

Analyze the following data from a study of ion-beam-assisted etching of aluminum with chlorine. The independent variable x is chlorine flow and the dependent variable y is the etch rate. Give an appropriate graph. State any assumptions you make and do what you can to test the assumptions. State your conclusions.
Predict the etch rate when flow = 1.75. Give a 95% confidence interval for the residual variance.

x

1.5

1.5

2.0

2.5

2.5

3.0

3.5

3.5

4.0

y

23.0

24.5

25.0

30.0

33.5

40.0

40.5

47.0

49.0

R x C Contingency Tables

Use the Pulmonary Disease data from problems 10.90-10.91 on p. 422 of Rosner. Analyse as a 3 x 2 contingency table and give a right-tail p-value. State your conclusions.

Paint brand:	1	1	1	1	2	2	2	2	3	3	3	3
Roller brand:	1	1	2	2	1	1	2	2	1	1	2	2
Coverage (ft²):	454	460	446	440	446	445	444	449	439	432	442	443

x	1.5	1.5	2.0	2.5	2.5	3.0	3.5	3.5	4.0
y	23.0	24.5	25.0	30.0	33.5	40.0	40.5	47.0	49.0