Verifying your Assumptions
Independence
- Review the experimental protocol, check out the equipment
used. Could your subjects confer together and share their
opinions? Did they know whether they were assigned to the
treatment or to the control group? Could your instrumentation lose
its calibration? Was there contamination carried over from one
determination to the next? What you can do here will depend on the
context of the study, and how much information, in addition to
numerical data, is available to you.
- If you suspect that there is autocorrelation, you can check
the data with lagged scatter plots or with Time Series Analysis,
but these methods are only useful if there are several hundred
observations in the series.
Normality
- Try histograms, stem-and-leaf-plots, box plots, probability
plots (look for these in MINITAB or SPSS, or use qqnorm() in R, they aren't on the
course), anything that will indicate if the shape of the
distribution departs too much from a symmetric bell curve. None of
these plots will tell you much unless you have at least 40 or 50
observations.
Homoscedasticity
- If you are comparing means in different groups, you usually
want to assume that the variance is the same in each group. You
can compare two variances using the F-test, but you know that the
test isn't very powerful unless you have a large sample in each
group and it isn't even valid unless you have independent normal
data from each group (see above!).
- Comparative Box Plots are a useful way of checking the data to
see if the dispersion is more or less comparable in each group.
- If the variance increases with the mean (groups with larger
means are more variable), then try log or square-root
transformations. Redraw the comparative box plots to see if the
variances are more equal with the transformed data. Of course,
comparing means of transformed data isn't the same as comparing
means in the original scale, but that is an unavoidable problem
with transformations.
Linearity
- If you have repeated x-values in a simple linear regression, you can compute "Pure Error" and "Lack of Fit" terms in the regression ANOVA to test the assumption that the true relationship is a straight line. A less satisfactory method is to fit a quadratic model and test to see if the fit is significantly better than the linear model.
Robustness
A statistical method is said to be "Robust" if it does what it is supposed to do even if the assumptions aren't satisfied. Generally, methods are more robust in large samples than they are with small samples. This is frustrating, because it is only with large samples that you can test the assumptions.
An example is the t-test. The t distribution is only justified if
the data are independent and normal, but if there are enough degrees
of freedom the t distribution becomes a standard normal so it doesn't
make much difference whether you treat the variance as known or
estimated, hence the assumption that s2 is distributed as
a Chi-square (which is only true for normal data) is no longer
important.