PART A
Complete Test #1, working individually and treating it as a take-home test. Please e-mail me if there is anything you don't understand.
PART B
Analyse the breast cancer survival data from Table 3.7-1 on p. 103 of Bishop, Fienberg & Holland (1975), Discrete Multivariate Analysis, MIT Press. Try both a log-linear analysis and a logistic analysis. Try to duplicate the models they give, and try to find the simplest model that explains the data well. Show that for this example, the log-linear and logistic analyses lead to exactly the same conclusions.
PART C
The data in Table 2.11 on p. 39 of Rosner (2000), Fundamentals of Biostatistics, 5th Edition, were collected on patients discharged from a hospital. Using GLMs, determine which factors predict duration of stay. Try normal error with identity link and gamma error with inverse link. Try modelling log(duration) and sqrt(duration) with normal error and identity link. Which of these models is best, in terms of fitting the model well and giving useful predictions? Look at the ANOVA table and diagnostic plots. If the result of the glm fit is
glmobj
, thenplot(glmobj)
will give diagnostic plots. Use the dispersion parameter to estimate the gamma shape parameter. Be sure that any categorical variables are entered into the model as factors or class variables. Usefamily = gaussian(link=identity)
andfamily = Gamma(link=inverse)
.
PART D
In Section 13.5 on pp. 596-604 of Rosner (2000) there is an example of the Mantel-Haenszel test for stratified categorical data. Reproduce this analysis as a loglinear GLM. Your chi-square values will be a bit different because Rosner uses Pearson's chi-square with continuity correction but you will use the log-linear deviance.