R FUNCTIONS FOR EXPLORATORY DATA ANALYSIS...

2006-09-27
ONE SAMPLE, ONE VARIABLE

Histogram: hist(); Stem-and-leaf plot: stem(); Box plot: boxplot(); Probability plot: qqnorm(), qqline().

TIME SERIES

Time series plot: plot(); Smoothed time series plot: plot(); esmooth(); Lag plot: lag.plot().

esmooth <- function (series, alpha = 0.1) 
{
    esseries <- series
    for (t in 2:length(series)) esseries[t] <- ifelse(is.na(series[t]), 
        esseries[t - 1], alpha * series[t] + (1 - alpha) * esseries[t - 1])
    esseries
}

TWO CONTINUOUS VARIABLES

Scatterplot: plot(); Fitted line: lm(), abline().

MORE THAN TWO CONTINUOUS VARIABLES

Scatterplot matrix: pairs().

ONE-FACTOR DESIGN

Comparative box plots: boxplot(), split().

TWO-FACTOR DESIGN

Comparative box plots: boxplot(), split(); Interaction plot: interaction.plot().

GRAPHICS PARAMETERS

To see documentation on the optional graphIcs parameters, type ?par.

CREATING AND SIZING THE GRAPHICS WINDOW

You will get better-looking graphs if you don't resize them after they are created. In Windows, use windows(h=4, w=6) to create a graphics window 4 inches high by 6 inches wide; this will usually be the right size to paste into a Word document. For a scatterplot matrix, you might want windows(h=6, w=6). To make two small graphs that will fit side by side, you could try windows(h=3, w=3). The corresponding function in Max OS is quartz(). If you have more than one graphics window open, each is assigned a number and you can use dev.set() to set which one is the active window.

To make nice graphics files for the web or publication, try pdf() or jpeg().

STATISTICAL CALCULATION

median(), quantile(), mean(), sd(), var(), cor(), lm().

DATA ENTRY

read.table(), data.frame(), c(), rep().

ARRAYS AND MATRICES AND DATA FRAMES

names(), dim(), nrow(), ncol(), diag().

CALCULATION ON LISTS OR ARRAYS

apply(), lapply(), sapply().


Statistics 3N03/3J04