STATISTICS 3N03/3J04 - Test #3 Solutions

2003-11-25

Question 1

(a) Homoscedasticity is the condition where different groups within a population have the same variance. [1 mark]

A parameter is a scalar or vector that indexes a family of probability distributions. [1 mark]

A statistic is any function of the observations in a sample. It may not include any unknown parameters. [1 mark]

A pivotal quantity is a function of a statistic and the parameter of interest that follows a standard distribution. The distribution may not include any unknown parameters. [1 mark]

(b) To derive a test statistic from a pivotal quantity, replace the parameter of interest by its hypothesized value to make the pivotal quantity into a test statistic, and use the distribution of the pivotal quantity as the reference distribution.

An example is (x_bar - m)/(s/sqrt(n)) ~ t(n-1) which gives the test statistic t0 = (x_bar - m0)/(s/sqrt(n)) with reference distribution t(n-1) for testing the hypothesis that m = m0. [4 marks]


Question 2

Biomonitoring is difficult because of problems of instrumentation (how would you measure exposure non-invasively?), logistics (you have to find and organize hundreds or thousands of volunteers in different geographic and socio-economic regions), ethics (these would be children who can't volunteer themselves and stand to gain nothing personally from the study), compliance (you may not know if the participants are following the protocol), and cost. Computer simulation avoids all of these issues. It must be random because the mean level of exposure to arsenic is of much less interest than the upper quantiles; the mean level might be well below a safe threshold while the upper quantiles are in the toxic range. [2 marks]


Question 3

This code will compute a (two-sided) P-value for a two-sample F-test of the hypothesis that two population variances are equal, against the alternative that they are not equal. It is only valid for samples from Normal populations. [5 marks]


Question 4

(a) Correct analysis: A two-sample t-test with a two-sided P-value to test the hypothesis that both treatments have the same mean percentage dimethoate recovery against the alternative that the means are different. [15 marks for the correct analysis with P-value, 12 marks for the correct analysis without P-value, maximum of 10 marks for the invalid analysis done completely, maximum of 10 marks if BOTH the correct and the invalid analysis are given. Give up to a 3-mark bonus if the F test for homoscedasticity is done correctly.]

Graph: comparative box plots, stem & leaf plots or dot diagrams are acceptable.

lime         urea      Stem: tens; Leaf: unit, tenths

2 |          3 | 59
2 | 39       3 |
2 | 47       3 | 87
2 | 62       4 | 16,18
2 | 85,96    4 | 32

n1 = n2 = 5, x1_bar = 26.58, x2_bar = 40.24, s12 = 5.917, s22 = 8.563, sp2 = 7.24
t0 = -8.027, df = 8, P < 0.001.

Assumptions: normality of each sample (see graphs, samples are too small to tell), independence (no idea, we can hope that the soil specimens were chosen randomly and that there was no way for the lime test results to affect urea results), homoscedasticity (looks OK, see graphs; could test with F0 = 8.563/5.917 = 1.45, reference distribution is F(4, 4) giving two-sided P > 0.5 so there is no evidence from these data of heteroscedasticity).

Conclusions: There is strong evidence (P < 0.001) from these data that the mean percentage dimethoate recovery is not the same for soil treated with lime and soil treated with urea.

Other valid analyses: A t-test without a P-value ("The hypothesis that the mean percentage dimethoate recovery is the same for both treatments is rejected at the 5% level of significance."), or compute a 95% confidence interval for the difference in mean deflection temperature (-17.58, -9.74) and note that it does not include 0. You were asked to give a P-value, however.

Invalid analysis: A paired t-test; t = -8.154, df = 4, P = 0.001. Assumptions: The paired t-test analysis is not valid because there is no pairing here, the lime-treated specimens are unrelated to the urea-treated specimens.

(b) Correct analysis: A paired t-test with a one-sided P-value to test the hypothesis that the mean difference in zinc concentration between bottom and surface water is 0 against the alternative that the mean concentration is higher at the bottom. [15 marks for the correct analysis with P-value, 12 marks for the correct analysis without P-value, maximum of 10 marks for the invalid analysis done completely, maximum of 10 marks if BOTH the correct and the invalid analysis are given.]

Graph: Stem & leaf plot, histogram, dot plot, box plot of differences (bottom-surface) are all acceptable.

Stem: tenths; Leaf: hundredths, thousandths
0 | 15,28
0 |
1 | 02,07,21
1 | 77

Assumptions: Normality (stem & leaf plot looks OK, but sample is too small); independence (can't test because the sample is so small).

n = 6, d_bar = 0.091667, sd2 = 0.0036831, sd = 0.060688, t0 = 3.700, df = 5, 0.01 > P > 0.005 (one-sided).

Conclusions: There is evidence (0.01 > P > 0.005) from these data that the mean zinc concentration is higher at the bottom than at the surface.

Other valid analyses: A t-test without a P-value ("The hypothesis that the mean zinc concentration is the same at the bottom as at the surface is rejected at the 5% level of significance."), or compute a 95% confidence interval for the mean difference (0.0280, 0.155) and note that it does not include 0.

Invalid analysis: A two-sample t-test; t = 1.010, df = 10, P > 0.1. Graph: comparative box plots, stem & leaf plots or dot diagrams. Assumptions: normality of each sample (looks OK, see graphs), independence (the samples are not independent, so the two-sample t-test analysis is not valid), homoscedasticity (looks OK, see graphs). Conclusions: There is no evidence (P > 0.1) from these data that the mean zinc concentration is the not the same at the bottom as at the surface.


Statistics 3N03/3J04