Verifying your Assumptions
Independence
- Review the experimental protocol, check out the equipment
used. Could your subjects confer together and share their
opinions? Did they know whether they were assigned to the
treatment or to the control group? Could your instrumentation lose
its calibration? Was there contamination carried over from one
determination to the next? What you can do here will depend on the
context of the study, and how much information, in addition to
numerical data, is available to you.
- If you suspect that there is autocorrelation, you can check
the data with lagged scatter plots or with Time Series Analysis,
but these methods are only useful if there are several hundred
observations in the series.
Normality
- Try histograms, stem-leaf-plots, box plots, probability plots
(look for these in MINITAB, they aren't on the course), anything
that will indicate if the shape of the distribution departs too
much from a symmetric bell curve. None of these plots will tell
you much unless you have at least 50 observations.
Homoscedasticity
- If you are comparing means in different groups, you generally
want to assume that the variance is the same in each group. You
can compare two variances using the F-test, but you know that the
test isn't very powerful unless you have a large sample in each
group and it isn't even valid unless you have independent normal
data from each group (see above!).
- Grouped Box Plots are a useful way of screening the data to
see if the dispersion is more or less comparable in each group.
- If the variance increases with the mean (groups with larger
means are more variable), then try log or square-root
transformations and redraw the grouped box plots to see if the
variances are more equal with the transformed data. Of course,
comparing means of transformed data isn't the same as comparing
means in the original scale, but that is an unavoidable problem
with transformations.
Robustness
A statistical method is said to be "Robust" if it does what it is
supposed to do even if the assumptions aren't satisfied. Generally,
methods are more robust in large samples than they are with small
samples. This is is frustrating, because it is only with large
samples that you can test the assumptions.
An example is the t-test. The t distribution is only justified if
the data are independent and normal, but if there are enough degrees
of freedom the t distribution becomes a standard normal so it doesn't
make much difference whether you treat the variance as known or
estimated and the assumption that s^2 is distributed as a Chi-square
(which is only true for normal data) is no longer important.
Back to the Statistics 2MA3 Home
Page