Verifying Assumptions

Verifying your Assumptions

Independence

Review the experimental protocol, check out the equipment used. Could your subjects confer together and share their opinions? Did they know whether they were assigned to the treatment or to the control group? Could your instrumentation lose its calibration? Was there contamination carried over from one determination to the next? What you can do here will depend on the context of the study, and how much information, in addition to numerical data, is available to you.
If you suspect that there is autocorrelation, you can check the data with lagged scatter plots or with Time Series Analysis, but these methods are only useful if there are several hundred observations in the series.

Normality

Try histograms, stem-and-leaf-plots, box plots, probability plots (look for these in MINITAB or SPSS, or use qqnorm() in R, they aren't on the course), anything that will indicate if the shape of the distribution departs too much from a symmetric bell curve. None of these plots will tell you much unless you have at least 40 or 50 observations.

Homoscedasticity

If you are comparing means in different groups, you usually want to assume that the variance is the same in each group. You can compare two variances using the F-test, but you know that the test isn't very powerful unless you have a large sample in each group and it isn't even valid unless you have independent normal data from each group (see above!).
Comparative Box Plots are a useful way of checking the data to see if the dispersion is more or less comparable in each group.
If the variance increases with the mean (groups with larger means are more variable), then try log or square-root transformations. Redraw the comparative box plots to see if the variances are more equal with the transformed data. Of course, comparing means of transformed data isn't the same as comparing means in the original scale, but that is an unavoidable problem with transformations.

Linearity

If you have repeated x-values in a simple linear regression, you can compute "Pure Error" and "Lack of Fit" terms in the regression ANOVA to test the assumption that the true relationship is a straight line. A less satisfactory method is to fit a quadratic model and test to see if the fit is significantly better than the linear model.

Robustness

A statistical method is said to be "Robust" if it does what it is supposed to do even if the assumptions aren't satisfied. Generally, methods are more robust in large samples than they are with small samples. This is frustrating, because it is only with large samples that you can test the assumptions.