Statistics 4P03/6P03 Assignments 1999-2000
All assignments must be
submitted no later than 5:00 PM on Thursday 27 April 2000, the final
day of the undergraduate examination period.
A01 2000-02-03
This is to be a first report on the
Seal
Vocalization Data.
- Carry out Exploratory Data Analysis, with an emphasis on
graphical methods. The work should primarily be done in Splus but
you can also try using a spreadsheet or another statistics
package.
- Write a summary - in words - of what you have learned from
EDA. Emphasize the important and meaningful features of the
graphs, but don't just describe in words what the graphs look
like.
- The EDA should leave you with some conjectures and unanswered
questions. Try to (a) formulate some questions for the data
owners, (b) give some suggestions for further studies, and (c)
propose some modeling ideas.
A02 1999-03-23
This is to be a first report on the
Data
Mining Case Study. Use either Splus or SAS. Splus may be more
convenient, but this is an opportunity to learn SAS.
- Divide the sample randomly into equal-size training and
validation samples.
- Transform (if you wish) and scale the data, scaling the
training and validation samples separately.
- Using the training sample, do a PCA on the explanatory
variables, select a subset of the components, and fit a logistic
regression on the (binary) objective.
- Predict the objective variable for the validation data and
show the predicted values on a box plot, split by the observed
value of the objective.
- Plot a gains chart assuming 1% responders in the customer
database.
- Do what you can to satisfy yourself that you are getting a
high gain without overfitting.
- How would you describe the model to a non-statistician? Are
the principal components interpretable?
A03 2000-03-23
Consider the Symphony Hamilton audience survey from the February
12 concert. The survey results and a number of basic statistics are
available as Splus objects.
Write a consultant's report addressing the following questions,
using only data from the survey and my report from the April 1995
survey.
- Describe the audience profile (geographic, occupational,
relationship to the orchestra, other events attended).
- Has the profile changed since the April 1995 survey?
- Is the audience's rating of the performance significantly
different from the previous survey?
- What form of publicity should be used to market the orchestra
to new audience who will return for future concerts?