Examples in R
2001-01-09
These examples were discussed in the lectures on 2001-01-05 and
2001-01-09.
R : Copyright 2000, The R Development Core Team
Version 1.1.0 (June 15, 2000)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type "?license" or "?licence" for distribution details.
R is a collaborative project with many contributors.
Type "?contributors" for a list.
Type "demo()" for some demos, "help()" for on-line help, or
"help.start()" for a HTML browser interface to help.
Type "q()" to quit R.
[Previously saved workspace restored]
Generate consecutive integers
> 1:5
[1] 1 2 3 4 5
> 1:20
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Assign values; read the following line as
"a gets 5". Note that you can type _ instead of <- but I
think <- is easier to read.
> a <- 5
> a
[1] 5
> b <- 1:15
> b
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
> xx <- c(2.4,5.2,-5.1,0,5,2,2.5,3.1)
> xx
[1] 2.4 5.2 -5.1 0.0 5.0 2.0 2.5 3.1
Commonly used functions and arithmetic.
Note that operations on arrays are elementwise.
> length(xx)
[1] 8
> yy <- 1:8
> yy
[1] 1 2 3 4 5 6 7 8
> xx+yy
[1] 3.4 7.2 -2.1 4.0 10.0 8.0 9.5 11.1
> xx*yy
[1] 2.4 10.4 -15.3 0.0 25.0 12.0 17.5 24.8
> xx^2
[1] 5.76 27.04 26.01 0.00 25.00 4.00 6.25 9.61
> xx/yy
[1] 2.4000000 2.6000000 -1.7000000 0.0000000 1.0000000 0.3333333 0.3571429
[8] 0.3875000
> log(xx)
[1] 0.8754687 1.6486586 NaN -Inf 1.6094379 0.6931472 0.9162907
[8] 1.1314021
Warning message:
NaNs produced in: log(x)
> exp(xx)
[1] 1.102318e+01 1.812722e+02 6.096747e-03 1.000000e+00 1.484132e+02
[6] 7.389056e+00 1.218249e+01 2.219795e+01
> sin(xx)
[1] 0.67546318 -0.88345466 0.92581468 0.00000000 -0.95892427 0.90929743
[7] 0.59847214 0.04158066
Subscripts
> xx
[1] 2.4 5.2 -5.1 0.0 5.0 2.0 2.5 3.1
> xx[2]
[1] 5.2
> xx[2:7]
[1] 5.2 -5.1 0.0 5.0 2.0 2.5
Statistical functions
> mean(xx)
[1] 1.8875
> var(yy)
[1] 6
> sort(xx)
[1] -5.1 0.0 2.0 2.4 2.5 3.1 5.0 5.2
Inputting data to a vector; note the "+"
prompt which indicates that the line is incomplete and more typing is
expected.
> # Data on motor octane ratings for various gasoline blends
> x <- c(88.5,87.7,83.4,86.7,87.5,91.5,88.6,100.3,
+ 95.6,93.3,94.7,91.1,91.0,94.2,87.5,89.9,
+ 88.3,87.6,84.3,86.7,88.2,90.8,88.3,98.8,
+ 94.2,92.7,93.2,91.0,90.3,93.4,88.5,90.1,
+ 89.2,88.3,85.3,87.9,88.6,90.9,89.0,96.1,
+ 93.3,91.8,92.3,90.4,90.1,93.0,88.7,89.9,
+ 89.8,89.6,87.4,88.9,91.2,89.3,94.4,92.7,
+ 91.8,91.6,90.4,91.1,92.6,89.8,90.6,91.1,
+ 90.4,89.3,89.7,90.3,91.6,90.5,93.7,92.7,
+ 92.2,92.2,91.2,91.0,92.2,90.0,90.7)
> x
[1] 88.5 87.7 83.4 86.7 87.5 91.5 88.6 100.3 95.6 93.3 94.7 91.1
[13] 91.0 94.2 87.5 89.9 88.3 87.6 84.3 86.7 88.2 90.8 88.3 98.8
[25] 94.2 92.7 93.2 91.0 90.3 93.4 88.5 90.1 89.2 88.3 85.3 87.9
[37] 88.6 90.9 89.0 96.1 93.3 91.8 92.3 90.4 90.1 93.0 88.7 89.9
[49] 89.8 89.6 87.4 88.9 91.2 89.3 94.4 92.7 91.8 91.6 90.4 91.1
[61] 92.6 89.8 90.6 91.1 90.4 89.3 89.7 90.3 91.6 90.5 93.7 92.7
[73] 92.2 92.2 91.2 91.0 92.2 90.0 90.7
> length(x)
[1] 79
> mean(x);var(x)
[1] 90.66709
[1] 7.895057
> stem(x)
The decimal point is at the |
82 | 4
84 | 33
86 | 77455679
88 | 23335566790233678899
90 | 01133444567890001112256688
92 | 22236777023347
94 | 22476
96 | 1
98 | 8
100 | 3
Typing the name of a function without
parentheses returns the code for the function, just as typing the
name of any object returns the content of that object.
> mean
function (x, ...)
UseMethod("mean")
Note that x>0 is a logical vector; for
purposes of arithmetic, "TRUE" counts as "1", so mean(x>90)
returns the proportion of elements greater than 90 in the vector
x.
> mean(x>90)
[1] 0.5949367
x[x>90] selects those elements of x that
are > 90.
> mean(x[x>90])
[1] 92.3468
> length(x[x>90])
[1] 47
A data frame is a rectangular structure;
the rows correspond to cases, items or subjects, the columns
correspond to variables. Note how x2 is made a factor; that is, a
categorical variable indicating which treatment group each subject
belongs to.
> mydata <- data.frame(y=c(1.2,3.6,5.1,4.2,2.1),
+ x1=c(1.5,2.5,6,3.1,2.2),x2=factor(c(1,1,1,2,2))
+ )
> mydata
y x1 x2
1 1.2 1.5 1
2 3.6 2.5 1
3 5.1 6.0 1
4 4.2 3.1 2
5 2.1 2.2 2
> mydata$y
[1] 1.2 3.6 5.1 4.2 2.1
> dimnames(mydata)
[[1]]
[1] "1" "2" "3" "4" "5"
[[2]]
[1] "y" "x1" "x2"
> dimnames(mydata)[[2]]
[1] "y" "x1" "x2"
A list is more general than a data frame;
it is an object that is a collection of other objects. Note the
different ways to refer to individual components of the list.
> mylist <- list(aa=c(2,44),bb=mydata)
> mylist
$aa
[1] 2 44
$bb
y x1 x2
1 1.2 1.5 1
2 3.6 2.5 1
3 5.1 6.0 1
4 4.2 3.1 2
5 2.1 2.2 2
> mylist$bb$x1
[1] 1.5 2.5 6.0 3.1 2.2
> mylist[[2]]
y x1 x2
1 1.2 1.5 1
2 3.6 2.5 1
3 5.1 6.0 1
4 4.2 3.1 2
5 2.1 2.2 2
> mylist[[2]][3,2]
[1] 6
A better way to write a function or edit an
existing function is fix(myfun); this command opens an editor. The
example below creates a simple one-line function.
> myfun <- function(x) 3*x^2
> myfun
function(x) 3*x^2
> myfun(2)
[1] 12
> myfun(0:10)
[1] 0 3 12 27 48 75 108 147 192 243 300
In practice, it is advisable to edit the
data in a spreadsheet or data base program, then export it as a plain
text file and import it into R. Note that x2 is made a factor after
importing. Note the test with is.factor().
TEXT FILE: "mydata.txt"
y x1 x2
1.2 1.5 1
3.6 2.5 1
5.1 6.0 1
4.2 3.1 2
2.1 2.2 2
> mydata1 <- read.table("mydata.txt", header=T)
> mydata1
y x1 x2
1 1.2 1.5 1
2 3.6 2.5 1
3 5.1 6.0 1
4 4.2 3.1 2
5 2.1 2.2 2
> is.factor(mydata1$x2)
[1] FALSE
> mydata1$x2 <- factor(mydata1$x2)
> is.factor(mydata1$x2)
[1] TRUE
Use split() to break down the data into
treatment groups. Note that the result of split() is a list.
> split(mydata$y,mydata$x2)
$"1"
[1] 1.2 3.6 5.1
$"2"
[1] 4.2 2.1
> lapply(split(mydata$y,mydata$x2),mean)
$"1"
[1] 3.3
$"2"
[1] 3.15
> sapply(split(mydata$y,mydata$x2),mean)
1 2
3.30 3.15
Some graphics to try
> boxplot(split(mydata$y,mydata$x2))
> plot(mydata$x1,mydata$y)
> pairs(mydata)
> xgrid <- seq(-3,3,length=100)
> plot(xgrid, dnorm(xgrid), type="l")
Keeping track of your workspace
> search()
> objects()
> attach(mydata)
> search()
> objects(1)
> objects(2)
> detach(2)
> search()
Online help
> ?mean