Examples in R

2001-01-09


These examples were discussed in the lectures on 2001-01-05 and 2001-01-09.


R : Copyright 2000, The R Development Core Team Version 1.1.0 (June 15, 2000)   R is free software and comes with ABSOLUTELY NO WARRANTY. You are welcome to redistribute it under certain conditions. Type "?license" or "?licence" for distribution details.   R is a collaborative project with many contributors. Type "?contributors" for a list.   Type "demo()" for some demos, "help()" for on-line help, or "help.start()" for a HTML browser interface to help. Type "q()" to quit R.   [Previously saved workspace restored]  

Generate consecutive integers

> 1:5
[1] 1 2 3 4 5
> 1:20
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20

Assign values; read the following line as "a gets 5". Note that you can type _ instead of <- but I think <- is easier to read.

> a <- 5
> a
[1] 5
> b <- 1:15
> b
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15
> xx <- c(2.4,5.2,-5.1,0,5,2,2.5,3.1)
> xx
[1]  2.4  5.2 -5.1  0.0  5.0  2.0  2.5  3.1

Commonly used functions and arithmetic. Note that operations on arrays are elementwise.

> length(xx)
[1] 8
> yy <- 1:8
> yy
[1] 1 2 3 4 5 6 7 8
> xx+yy
[1]  3.4  7.2 -2.1  4.0 10.0  8.0  9.5 11.1
> xx*yy
[1]   2.4  10.4 -15.3   0.0  25.0  12.0  17.5  24.8
> xx^2
[1]  5.76 27.04 26.01  0.00 25.00  4.00  6.25  9.61
> xx/yy
[1]  2.4000000  2.6000000 -1.7000000  0.0000000  1.0000000  0.3333333  0.3571429
[8]  0.3875000
> log(xx)
[1] 0.8754687 1.6486586       NaN      -Inf 1.6094379 0.6931472 0.9162907
[8] 1.1314021
Warning message:
NaNs produced in: log(x)
> exp(xx)
[1] 1.102318e+01 1.812722e+02 6.096747e-03 1.000000e+00 1.484132e+02
[6] 7.389056e+00 1.218249e+01 2.219795e+01
> sin(xx)
[1]  0.67546318 -0.88345466  0.92581468  0.00000000 -0.95892427  0.90929743
[7]  0.59847214  0.04158066

Subscripts

> xx
[1]  2.4  5.2 -5.1  0.0  5.0  2.0  2.5  3.1
> xx[2]
[1] 5.2
> xx[2:7]
[1]  5.2 -5.1  0.0  5.0  2.0  2.5

Statistical functions

> mean(xx)
[1] 1.8875
> var(yy)
[1] 6
> sort(xx)
[1] -5.1  0.0  2.0  2.4  2.5  3.1  5.0  5.2

Inputting data to a vector; note the "+" prompt which indicates that the line is incomplete and more typing is expected.

> # Data on motor octane ratings for various gasoline blends
> x <- c(88.5,87.7,83.4,86.7,87.5,91.5,88.6,100.3,
+ 95.6,93.3,94.7,91.1,91.0,94.2,87.5,89.9,
+ 88.3,87.6,84.3,86.7,88.2,90.8,88.3,98.8,
+ 94.2,92.7,93.2,91.0,90.3,93.4,88.5,90.1,
+ 89.2,88.3,85.3,87.9,88.6,90.9,89.0,96.1,
+ 93.3,91.8,92.3,90.4,90.1,93.0,88.7,89.9,
+ 89.8,89.6,87.4,88.9,91.2,89.3,94.4,92.7,
+ 91.8,91.6,90.4,91.1,92.6,89.8,90.6,91.1,
+ 90.4,89.3,89.7,90.3,91.6,90.5,93.7,92.7,
+ 92.2,92.2,91.2,91.0,92.2,90.0,90.7)
> x
 [1]  88.5  87.7  83.4  86.7  87.5  91.5  88.6 100.3  95.6  93.3  94.7  91.1
[13]  91.0  94.2  87.5  89.9  88.3  87.6  84.3  86.7  88.2  90.8  88.3  98.8
[25]  94.2  92.7  93.2  91.0  90.3  93.4  88.5  90.1  89.2  88.3  85.3  87.9
[37]  88.6  90.9  89.0  96.1  93.3  91.8  92.3  90.4  90.1  93.0  88.7  89.9
[49]  89.8  89.6  87.4  88.9  91.2  89.3  94.4  92.7  91.8  91.6  90.4  91.1
[61]  92.6  89.8  90.6  91.1  90.4  89.3  89.7  90.3  91.6  90.5  93.7  92.7
[73]  92.2  92.2  91.2  91.0  92.2  90.0  90.7
> length(x)
[1] 79
> mean(x);var(x)
[1] 90.66709
[1] 7.895057
> stem(x)
 
  The decimal point is at the |
 
   82 | 4
   84 | 33
   86 | 77455679
   88 | 23335566790233678899
   90 | 01133444567890001112256688
   92 | 22236777023347
   94 | 22476
   96 | 1
   98 | 8
  100 | 3
 

Typing the name of a function without parentheses returns the code for the function, just as typing the name of any object returns the content of that object.

> mean
function (x, ...)
UseMethod("mean")

Note that x>0 is a logical vector; for purposes of arithmetic, "TRUE" counts as "1", so mean(x>90) returns the proportion of elements greater than 90 in the vector x.

> mean(x>90)
[1] 0.5949367

x[x>90] selects those elements of x that are > 90.

> mean(x[x>90])
[1] 92.3468
> length(x[x>90])
[1] 47

A data frame is a rectangular structure; the rows correspond to cases, items or subjects, the columns correspond to variables. Note how x2 is made a factor; that is, a categorical variable indicating which treatment group each subject belongs to.

> mydata <- data.frame(y=c(1.2,3.6,5.1,4.2,2.1),
+ x1=c(1.5,2.5,6,3.1,2.2),x2=factor(c(1,1,1,2,2))
+ )
> mydata
    y  x1 x2
1 1.2 1.5  1
2 3.6 2.5  1
3 5.1 6.0  1
4 4.2 3.1  2
5 2.1 2.2  2
> mydata$y
[1] 1.2 3.6 5.1 4.2 2.1
> dimnames(mydata)
[[1]]
[1] "1" "2" "3" "4" "5"
 
[[2]]
[1] "y"  "x1" "x2"
 
> dimnames(mydata)[[2]]
[1] "y"  "x1" "x2"

A list is more general than a data frame; it is an object that is a collection of other objects. Note the different ways to refer to individual components of the list.

> mylist <- list(aa=c(2,44),bb=mydata)
> mylist
$aa
[1]  2 44
 
$bb
    y  x1 x2
1 1.2 1.5  1
2 3.6 2.5  1
3 5.1 6.0  1
4 4.2 3.1  2
5 2.1 2.2  2
 
> mylist$bb$x1
[1] 1.5 2.5 6.0 3.1 2.2
> mylist[[2]]
    y  x1 x2
1 1.2 1.5  1
2 3.6 2.5  1
3 5.1 6.0  1
4 4.2 3.1  2
5 2.1 2.2  2
> mylist[[2]][3,2]
[1] 6

A better way to write a function or edit an existing function is fix(myfun); this command opens an editor. The example below creates a simple one-line function.

> myfun <- function(x) 3*x^2
> myfun
function(x) 3*x^2
> myfun(2)
[1] 12
> myfun(0:10)
 [1]   0   3  12  27  48  75 108 147 192 243 300
 

In practice, it is advisable to edit the data in a spreadsheet or data base program, then export it as a plain text file and import it into R. Note that x2 is made a factor after importing. Note the test with is.factor().

TEXT FILE: "mydata.txt"
 
    y  x1 x2
 1.2 1.5  1
 3.6 2.5  1
 5.1 6.0  1
 4.2 3.1  2
 2.1 2.2  2
 
> mydata1 <- read.table("mydata.txt", header=T)
> mydata1
    y  x1 x2
1 1.2 1.5  1
2 3.6 2.5  1
3 5.1 6.0  1
4 4.2 3.1  2
5 2.1 2.2  2
> is.factor(mydata1$x2)
[1] FALSE
> mydata1$x2 <- factor(mydata1$x2)
> is.factor(mydata1$x2)
[1] TRUE

Use split() to break down the data into treatment groups. Note that the result of split() is a list.

> split(mydata$y,mydata$x2)
$"1"
[1] 1.2 3.6 5.1
 
$"2"
[1] 4.2 2.1
 
> lapply(split(mydata$y,mydata$x2),mean)
$"1"
[1] 3.3
 
$"2"
[1] 3.15
 
> sapply(split(mydata$y,mydata$x2),mean)
   1    2
3.30 3.15

Some graphics to try

> boxplot(split(mydata$y,mydata$x2))
> plot(mydata$x1,mydata$y)
> pairs(mydata)
> xgrid <- seq(-3,3,length=100)
> plot(xgrid, dnorm(xgrid), type="l")

Keeping track of your workspace

> search()
> objects()
> attach(mydata)
> search()
> objects(1)
> objects(2)
> detach(2)
> search()

Online help

> ?mean

Statistics 2MA3 2000-2001
Statistics 2MA3