Suppose you want to find a two-sided 99% confidence interval for the variance of a normal population. How many observations are needed to ensure that the upper limit is no more than 4 times the lower limit? Will this calculation be valid for non-normal data?
If your time to run a marathon has a mean of 4 hr 10 min with a standard deviation of 15 min, and your friend's time has a mean of 3 hr 55 min with a standard deviation of 10 min, and you both run in the same race, what is the probability that you will finish ahead of your friend? State any assumptions you make and discuss their validity in this example.
Samples of four different brands of diet margarine were analysed to determine the percentage of physiologically active polyunsaturated fatty acids (PAPFUA). Give an appropriate analysis, including an ANOVA table and a graph. Give a 95% confidence interval for the residual variance. State any assumptions you make and do what you can to test the assumptions. State your conclusions.
Brand PAPFUA (%) Imperial: 14.1 13.6 14.4 14.3 Parkay: 12.8 12.5 13.4 13.0 12.3 Mazola: 16.8 17.2 16.4 17.3 18.0 Fleischmann's: 18.1 17.2 18.7 18.4
(a) Two types of fish attractors, one made from clay pipes and the other from cement blocks and brush, were used over 8 consecutive time periods spanning two years at Lake Tohopekaliga, Florida. The data give fish caught per fishing day in each time period.
period 1 2 3 4 5 6 7 8 pipe 6.64 7.89 1.83 0.42 0.85 0.29 0.57 0.63 brush 9.73 8.21 2.17 0.75 1.61 0.75 0.83 0.56
Is there evidence that one attractor is more effective than the other? Which is the most appropriate analysis: a two-sample t-test, a paired t-test, or a paired t-test on log-transformed data? Give a p-value. State your assumptions and your conclusions.
(b) Answer (a) using the sign test. State your assumptions and your conclusions.
(c) Suppose that you are planning a larger study of the same attractors in the same lake. If one attractor attracts 50% more fish than the other, how many time periods would you need to ensure that a 5% test has a power of 99%?
A camera was developed to determine the gray level over the lens of the human eye. Twelve patients were randomly selected, six with normal eyes and another six with cataractous eyes. One eye was tested on each patient. The data show the gray level on a scale of 1 (black) to 256 (white).
patient 1 2 3 4 5 6 cataractous 161 140 136 171 106 149 normal 158 182 185 145 167 177
(a) Display the data on an appropriate graph. Is there evidence of a significant difference between the two groups? Give a p-value, state your assumptions and your conclusions, and do what you can to assess the validity of your assumptions.
(b) How useful would this be as a screening test for cataracts? Plot a ROC curve. Choose a suitable cutoff. At this cutoff, what would the predictive value positive be if the prevalence of cataracts were 12% in the population being screened?
The following data are from a 1979 paper classifying 445 college students according to their level of marijuana use, and their parents' use of alcohol and psychoactive drugs. Do the data suggest that parental usage and student usage are independent in the population from which the sample was drawn? State your assumptions and your conclusions.
Student level of marijuana use never occasional regular Parental neither 141 54 40 use of alcohol one 68 44 51 & drugs both 17 11 19
Analyze the following data from a study to determine the effect of different plate materials and ambient temperature on the effective life (in hours) of a storage battery. There were 3 replicates at each combination of material and temperature. Give a 99% confidence interval for the residual variance. State your assumptions and your conclusions.
Ambient temperature low medium high Plate material 1 130, 74, 155 34, 80, 40 20, 82, 70 2 150, 159, 188 136, 106, 122 25, 58, 70
A study of nitrogen emissions from power boilers reported x = burner area liberation rate (in Mbtu/hr-ft2) and y = NOx emission rate (in ppm) for 14 boilers.
x 100 125 125 150 150 200 200 250 250 300 300 350 400 400 y 150 140 180 210 190 320 280 400 430 440 390 600 610 670
(a) Fit a straight line to the data by least squares, with NOx emission rate as the dependent variable. Plot the data and the fitted line on a graph. Can NOx emission rate be predicted as a linear function of burner area liberation rate? Present your analysis in an ANOVA table with F-Tests for non-linearity and for the slope of the regression line. Give a 95% confidence interval for the residual variance. State your assumptions and your conclusions.
(b) Predict the NOx emission rate at burner area liberation rates 10 and 325. How reliable do you think your predictions are?
(a) In a genetics study, you plan to observe 20 F1 offspring. Your theory says that the probability of getting a wild-eyed female is 1/4 but you are concerned that the actual probability may be greater than 1/4. Using the exact binomial distribution, find a critical region such that the type I error rate is as close as possible to 5%. What is the exact significance level of the test? What is the probability of a type II error if the true probability of a wild-eyed female is 1/2? If the true probability of a wild-eyed female is 3/4?
(b) Suppose you do the study and find that 10 out of the 20 F1 offspring are wild-eyed females. Apply the test you derived in (a). State your conclusions. Compute a P-value and state your conclusions in terms of a P-value.
(c) Repeat (b) using the normal approximation to the binomial (with continuity correction).
In Assignment #2, Chapter 4 Question C you looked at data giving the number of leaves on 53 plants and and you computed the variance-mean ratio to test whether it was overdispersed. Because you didn't know the sampling distribution of the variance-mean ratio, you could not do a test of significance or compute a P-value. Try the same simulation method you used to study the sampling distribution of the sample mean empirically in Assignment #1 Part C Question 2. Generate 1000 samples, each with 53 independent observations from a Poisson distribution with the same mean as the leaf data. Compute the variance-mean ratio for each sample, and use functions like max(), sort() and quantile() to see where the variance-mean ratio for the leaf data lies in this distribution. What would you give as a P-value in this case? Note that it is convenient to define a function vmratio() to compute the ratio.
> vmratio <- function(x) var(x)/mean(x) > vmratio(nleaf) [1] 2.269805 > nleafdat <- apply(matrix(rpois(53*1000,mean(nleaf)), ncol=1000), 2, vmratio)