Statistics 2MA3 - Test #2 Solutions
2003-03-08
The marking scheme is indicated in red. Full Marks = 40.
Q1 [9]
- Bayes' Theorem says that if a sample space is
partitioned into k mutually exclusive events E1, ..., Ek,
then
P(Ei | A) = P(A | Ei) P(Ei)/{ P(A
| E1) P(E1) + ... + P(A | Ek) P(Ek)
}.
Give three interesting facts about the life and work of
Thomas Bayes (1702-1761). - A parameter is a
scalar or vector that indexes a family of probability distributions.
A statistic is any function of the observations in a sample; it may not include any unknown parameters. The distribution of a statistic is called a sampling distribution; it describes how the statistic will vary from one sample to another.
Q2 [9]
- The distribution of X is discrete uniform, with each possible value from 1 to
4 having the same probabilty 1/4. Hence
E[X] = 1*(1/4) + 2*(1/4) + 3*(1/4) + 4*(1/4) = 2.5
E[X2] = 1*(1/4) + 4*(1/4) + 9*(1/4) + 16*(1/4) = 7.5
Var[X] = E[X2] - E[X]2 = 1.25
- Since Y is the sum of 9 independent realizations of X, E[Y] =
9*E[X] = 22.5 and Var[Y] = 9*Var[X] = 11.25. By the Central Limit Theorem the distribution of Y will be approximately normal so, approximately, P[Y >
26] = 1 - F((26.5 - 22.5)/sqrt(11.25)) =
1 - F(1.19257) = 0.117. (Note that I have used a continuity correction: because the actual score must be an integer, I have approximated the probability of getting
25 or less by taking the area under the normal curve up to 25.5. You will get a slightly different answer if you omit the continuity correction.)
Q3 [9]
> sens <- pnorm((3-3.5)/.6)
> spec <- 1-pnorm((3-4)/.5)
> prev <- 0.22
> c(sens=sens, spec=spec, prev=prev)
sens spec prev
0.2023284 0.9772499 0.2200000
> pvp <- sens*prev/(sens*prev + (1-spec)*(1-prev))
> pvp
[1] 0.7149717
A subject is classified as a smoker if FEV < 3; since FEV ~ N(3.5,
0.6^2) for smokers, the sensitivity of the test is F((3 -
3.5)/0.6) = 0.202. Also, since FEV ~ N(4, 0.5^2) for nonsmokers, the
specificity is 1 - F((3 - 4)/0.5) = 0.977. If
the prevalence is 22%, Bayes' Theorem gives PV+ = 0.715.
Q4 [3]
Assume a Poisson distribution for the number X of accidents in a given
week; if the mean really is 1.6, then the probability of getting 4 or
more is easily computed as
P(X >= 4) = 1-exp(-1.6)*(1 + 1.6 + (1.6^2)/2 + (1.6^3)/6) = 0.0788
Since this is greater than 5%, we conclude that 4 accidents is not
significantly high and hence there is no evidence that the mean rate has
increased.
Q5 [10]
> dhr <- c(4, 6, 5, 2, 5, -8, 1, -2, 8, 0, 12, 13, 1, 0, 7)
> n <- length(dhr)
> dbar <- mean(dhr)
> s <- sqrt(var(dhr))
> c(n=n, dbar=dbar, s=s)
n dbar s
15.000000 3.600000 5.395766
> qt(.975,n-1)
[1] 2.144787
> dbar + c(-1,1)*qt(.975,n-1)*s/sqrt(n)
[1] 0.6119246 6.5880754
> qnorm(.975)
[1] 1.959964
> dbar + c(-1,1)*qnorm(.975)*s/sqrt(n)
[1] 0.869416 6.330584
- Whether computed as (0.61, 6.59) with the t-distribution or, approximately, as (0.87,
6.33) with the normal distribution, the 95% confidence interval for the mean difference in heart rate
excludes 0 so at the 95% level of confidence (or 5% level of
significance) there is evidence that the true mean difference between heart rate
before treatment and heart rate after treatment is not zero. This analysis assumes that the subjects are independent and the differences are normally distributed.
- The zero differences carry no information one way or the other; it would bias the result to treat them as positive or negative, so it is best to omit them and say that we have
11 positive differences out of 13 non-zero differences. Under the hypothesis that the median difference is zero, the number of positive differences will be distributed Bin(13, 0.5). The probability of getting
11 or more positive differences out of 13 would then be 78/8192 + 13/8192
+ 1/8192 = 92/8192 = 0.0112 which is less than 5% so there is reason to claim that the median difference between heart rate
before and after treatment is not zero.