Getting Started in Splus and R

2000-09-18


The Basics

Spend some time with the On-line Splus Tutorial from Dalhousie University. If you are going to be using the UNIX versions of Splus, read the relevant sections of Running Splus at McMaster.

Setting your Working Directories in Windows

Copy the shortcut to R from the course folder to the desktop. Edit the Properties of the new shortcut, so that R will "Start in" a folder of your choosing, such as "D:\Temp\myRproject1". Save the workspace image to this folder. After your session, copy the whole folder myRproject1 to a floppy disk or, by FTP, to a remote server. To begin a new session where you left off, copy myRproject1 from the floppy disk or remote server to the hard drive of the machine you are on.

The same procedure will work for Splus 2000, but by default Splus will start in D:\Temp\sp2000\userid so you will not usually have to set the "Start in" folder to anything else.

Random Data, Histograms, and Density Functions

Plot a line graph of the standard normal probability density function and add a title to the graph.

> xs <- seq(-4, 4, length = 100)
> plot(xs, dnorm(xs), type = "l", main = "Standard Normal pdf")

Generate 100 observations from a standard normal distribution, compute the mean and standard deviation, and plot a histogram.

> xx <- rnorm(100)
> mean(xx)
[1] 0.1211835
> sqrt(var(xx))
[1] 1.181486
> hist(xx)

Plot a histogram with break points from -4 to +4 in steps of 0.5.

> xb <- seq(-4, 4, by = 0.5)
> hist(xx, breaks = xb)

Add a standard normal density function to this plot, scaled to match the area under the histogram.

> lines(xs, 100*0.5*dnorm(xs))

Note that plot() initiates a new plot but lines() adds lines to an existing plot. Explain the scaling factor.

Repeat for a sample of 1000 standard normal observations.

Generate 100 observations from a gamma distribution with shape parameter 0.5; compute the sample mean and standard deviation and plot a histogram with an appropriately scaled gamma density function superimposed. Repeat for 1000 observations.

Exploratory Data Analysis of Forestry Remote Sensing

Each observation is a stand of trees observed by satellite and on the ground.

  ht    Mean height
  dbh   Mean diameter (at breast height)
  bnd5  Satellite Band 5
  cci   Crown cover index (fraction of ground covered by the canopy)
  type  "Damaged" (road construction, wind damage, etc.) or "Undamaged".

Get the data and save it in a file called trees.txt. Use

> trees <- read.table("trees.txt",header=T)

to read data from the file into a data frame called trees. Attach the data frame to your session. Confirm the names of the 5 variables. Look at the first 6 rows to see what is there. Draw a scatterplot matrix for all variables, then for the 4 quantitative variables, for all stands of trees. Draw a scatterplot matrix for the 4 quantitative variables for the undamaged stands alone, and for the damaged stands alone.

> attach(trees)
> dimnames(trees)[[2]]
[1] "ht"   "dbh"  "bnd5" "cci"  "type"
 
> trees[1:6,]
    ht dbh bnd5        cci      type
1 1.00  NA  102 0.04146902 undamaged
2 1.21  NA   99 0.05835823 undamaged
3 1.21  NA   99 0.06029240 undamaged
4 1.18  NA  102 0.06414085 undamaged
5 1.03  NA  103 0.04427970 undamaged
6 1.06  NA  104 0.05017752 undamaged
 
> pairs(trees)
> pairs(trees[, 1:4])
> pairs(trees[type == "undamaged", 1:4])
> pairs(trees[type == "damaged", 1:4])
 

There are some missing values in dbh. Does pairs() eliminate missing vales pairwise or listwise? One way to find out would be to do a pairs plot of the first 71 rows.

Draw a box plot to compare crown cover index in the damaged and undamaged stands.

Because we have attached the data frame, we can refer directly to the variables within it, i.e. we can write cci instead of trees$cci.

> boxplot(split(cci, type))

What have you learned about the trees? Can the satellite readings predict crown cover? Discuss any special features you see in the data.


Writing a Function

Use fix(by2) to write a function by2() which multiplies a given value by 2. Note that the value returned by a function is the last value computed (unlike other programming languages which require an assignment statement) and the default value of the argument is 1.

> by2
function(x = 1)
{
        2 * x
}
 
> by2()
[1] 2
> by2(99)
[1] 198

Use fix(norm3d) to write a function to prepare a grid, compute the standard bivariate normal probability density function over the grid, and display it as either a perspective plot or a contour plot.

Note that the scaling factor scl could be omitted in this calculation as it does not affect the shape of the distribution. Look at the first few rows of xynor; what did expand.grid do? What is the purpose of invisible() on the last line?

> norm3d
function(view = "p")
{
        xnor <- seq(-3., 3., by = 0.1)
        ynor <- xnor
        xynor <- expand.grid(list(xnor, ynor))
        scl <- 1/(2 * pi)
        znor <- matrix(scl * exp(-0.5 * (xynor[, 1.]^2. + xynor[, 2.]^2.)),
                nrow = length(xnor))
        if(view == "p")
                persp(xnor, ynor, znor)
        if(view == "c")
                contour(xnor, ynor, znor)
        invisible()
}

Plotting an Ellipse

Try the example discussed in class, Drawing an ellipse in Splus or R.


Spin Plots

Try spin plots for the trees data, taking variables 3 at a time.


Another Multivariate Example

Try Exercise 1.14 from the text. The data are on the data disk that comes with the book.


Stars and Faces

Try Exercises 1.24 and 1.25. The data are on the data disk.


Statistics 4M03/6M03
Last modified 2000-09-20 13:20