Verifying Assumptions

Verifying your Assumptions

Independence

Review the experimental protocol, check out the equipment used. Could your subjects confer together and share their opinions? Did they know whether they were assigned to the treatment or to the control group? Could your instrumentation lose its calibration? Was there contamination carried over from one determination to the next? What you can do here will depend on the context of the study, and how much information, in addition to numerical data, is available to you.
If you suspect that there is autocorrelation, you can check the data with lagged scatter plots or with Time Series Analysis, but these methods are only useful if there are several hundred observations in the series.

Normality

Try histograms, stem-leaf-plots, box plots, probability plots (look for these in MINITAB, they aren't on the course), anything that will indicate if the shape of the distribution departs too much from a symmetric bell curve. None of these plots will tell you much unless you have at least 50 observations.

Homoscedasticity

If you are comparing means in different groups, you generally want to assume that the variance is the same in each group. You can compare two variances using the F-test, but you know that the test isn't very powerful unless you have a large sample in each group and it isn't even valid unless you have independent normal data from each group (see above!).
Grouped Box Plots are a useful way of screening the data to see if the dispersion is more or less comparable in each group.
If the variance increases with the mean (groups with larger means are more variable), then try log or square-root transformations and redraw the grouped box plots to see if the variances are more equal with the transformed data. Of course, comparing means of transformed data isn't the same as comparing means in the original scale, but that is an unavoidable problem with transformations.

Robustness

A statistical method is said to be "Robust" if it does what it is supposed to do even if the assumptions aren't satisfied. Generally, methods are more robust in large samples than they are with small samples. This is is frustrating, because it is only with large samples that you can test the assumptions.

An example is the t-test. The t distribution is only justified if the data are independent and normal, but if there are enough degrees of freedom the t distribution becomes a standard normal so it doesn't make much difference whether you treat the variance as known or estimated and the assumption that s^2 is distributed as a Chi-square (which is only true for normal data) is no longer important.

Back to the Statistics 2MA3 Home Page