Statistics 2MA3 - Exercise #4
Updated 2001-03-31 15:40
Do these exercises by hand with a pocket calculator (where
feasible), then check your work with R.
These problems and solutions cover all topics
to the end of the course.
One-Sample Tests and Confidence Intervals
- Use the Hospital Stay data in Table 2.11 on p. 39 of Rosner.
(HOSPITAL.DAT, HOSPITAL.DOC on the data disk.) Plot a histogram of
duration of stay. Compute the mean duration of stay. Assuming that
the standard deviation of duration of stay is 4 days, find a
2-sided 95% confidence interval for the mean duration of stay and
find a p-value to test the hypothesis that the mean length
of stay is 5 days against the alternative that it is longer than
that. Test the hypothesis that the standard deviation is 4 days,
with a 2-sided 5% test. Without assuming a value for standard
deviation, find a 2-sided 95% confidence interval for the mean
duration of stay and find a p -value to test the hypothesis
that the mean length of stay is 5 days against the alternative
that it is longer than that. State your conclusions. Which of the
above calculations may be invalid?
- The mean concentration of a solution is supposed to be 60%.
You suspect that it might be off by as much as 2% in either
direction. From past experience, you know that your analytical
method has a standard deviation of 2.6%. How many observations
would be required to test this hypothesis at the 5% level, and
ensure that the Type II Error Rate is no more than 1%?
- The percentage of voters supporting a given political party
was 60% at the last poll. You suspect that this might have changed
by as much as 2% in either direction. How many voters should you
sample to test this hypothesis at the 5% level, and ensure that
the Type II Error Rate is no more than 1%? What would you do if
the recommended number is larger than you can afford? If you
sampled 3000 voters and 1623 were supporters of that party, give a
2-sided 95% confidence interval for percent support.
- How many independent normal observations would be required to
ensure that the upper limit of a 99% confidence interval for the
variance is no more than 5 times the lower limit?
Two-Sample Tests and Confidence Intervals
- Using the nutrition data described in Table 2.16 on p. 42 of
Rosner (VALID.DAT, VALID.DOC on the data disk), compute a
p-value to test the hypothesis that the mean alcohol
consumption from the food frequency questionnaire is the same as
from the diet record. Do what you can to test any assumptions you
make. State your conclusions.
- Repeat the previous analysis using the sign test (Sect. 9.2,
p. 333) to test the hypothesis that the median difference between
the two measures is zero. State your assumptions and your
conclusions.
- Do the Microbiology example, problems 8.148-8.152 on p.
326 of Rosner. Give an appropriate graphical display of the data.
State any assumptions you make and test any assumptions you can
test.
One-Factor Analysis of Variance
- You have analyzed the Microbiology example, problems
8.148-8.152 on p. 326 of Rosner, as a two-sample t-test.
Repeat the analysis, this time as an analysis of variance for a
one-factor design. Show that the F statistic in the anova table is
the square of the two-sample t statistic and has the same
p-value. Show that the mean squared error (also called the
mean squared residual or residual variance) is the same as the
pooled variance estimate in the t-test. The graphical displays,
assumptions and conclusions are exactly the same for both
analyses.
- Give a 95% confidence interval for the conditional variance of
pod weight, given treatment, that is, the residual mean squared
error after fitting treatment as a factor.
- Analyze the Obstetrics data on p. 569 of Rosner. Answer
problem 12.14 with a comparative box plot and an ANOVA
table. Give a 95% confidence interval for the residual variance
s2.
Two-Factor Analysis of Variance
- To see if the coverage of light blue latex interior paint
depends on either the brand of paint or the brand of roller used,
two gallons of each of three brands of paint were applied using
each of two brands of roller. Present your results in a two-factor
ANOVA table. State your assumptions and your conclusions. Give a
95% confidence interval for the residual variance s2.
Paint brand:
|
1
|
1
|
1
|
1
|
2
|
2
|
2
|
2
|
3
|
3
|
3
|
3
|
Roller brand:
|
1
|
1
|
2
|
2
|
1
|
1
|
2
|
2
|
1
|
1
|
2
|
2
|
Coverage (ft2):
|
454
|
460
|
446
|
440
|
446
|
445
|
444
|
449
|
439
|
432
|
442
|
443
|
Simple Linear Regression
- Use the Hospital Stay data in Table 2.11 on p. 39 of Rosner.
(HOSPITAL.DAT, HOSPITAL.DOC on the data disk.) Plot duration of
stay (dependent variable) against age (independent variable). Fit
a straight line to the data and add it to the graph. Summarize the
fit in an ANOVA table and state your assumptions and conclusions.
- Use the Hospital Stay data in Table 2.11 on p. 39 of Rosner.
(HOSPITAL.DAT, HOSPITAL.DOC on the data disk.) Plot duration of
stay (dependent variable) against first temperature following
admission (independent variable). Fit a straight line to the data
and add it to the graph. Summarize the fit in an ANOVA table and
state your assumptions and conclusions.
Multiple Regression
- Use the Hospital Stay data in Table 2.11 on p. 39 of Rosner.
(HOSPITAL.DAT, HOSPITAL.DOC on the data disk.) Give a pairs plot
and a correlation matrix for the following variables: duration of
stay, age, first temperature and first white blood cell count. Fit
the model duration ~ age+temp1+wbc1. Summarize the fit in
an ANOVA table. Plot the observed values against the fitted values
and add a diagonal line to the plot. Plot the residuals against
the fitted values. State your assumptions and conclusions.
- Continuing with the Hospital Stay data, fit the model
duration ~ temp1+age+wbc1 and give the ANOVA table. Discuss
how it differs from the previous fit.
Simple Linear Regression with a Test for Lack of Fit
- Analyze the following data from a study of ion-beam-assisted
etching of aluminum with chlorine. The independent variable x is
chlorine flow and the dependent variable y is the etch rate. Give
an appropriate graph. State any assumptions you make and do what
you can to test the assumptions. State your conclusions.
Predict the etch rate when flow = 1.75. Give a 95% confidence
interval for the residual variance.
x
|
1.5
|
1.5
|
2.0
|
2.5
|
2.5
|
3.0
|
3.5
|
3.5
|
4.0
|
y
|
23.0
|
24.5
|
25.0
|
30.0
|
33.5
|
40.0
|
40.5
|
47.0
|
49.0
|
R x C Contingency Tables
- Use the Pulmonary Disease data from problems
10.90-10.91 on p. 422 of Rosner. Analyse as a 3 x 2
contingency table and give a right-tail p-value. State your
conclusions.