Most of you still need to get caught up.
Learn the difference between commands in UNIX and functions in S. What does it mean when we say that S is "object-oriented"?
When we meet next Friday, I will expect you to have looked at the "Effects of Effluent on the Kapuskasing River" data and we will discuss what we have learned.
I will also expect you to have set up the data files for the other SSC Case Study, "Habitat Availability in Dutch Creek," as data frames in S. You may find that this process is messier than you would like.
I plan to talk about model fitting, beginning with Chapter 10 in Spector. Have a look at the following functions: lm(), glm(), anova() and update() and see how to specify a model. Note that lm() doesn't work on hydra and glm() isn't available on hydra, so you will have to use data.
Try some simple examples of model fitting. In particular, try the two examples on the S2MA3 web site: simple linear regression with repeated x-values and two-factor ANOVA with replication. These examples are worked out in detail there. Convince yourself that it was easier to learn S and do the calculations in S than it would have been to do the calculations "by hand."
I think I have taken the analysis of the Kapuskasing River data as far as it can go. I created a new column (lingrow) with the growth increments (ingrow) transformed to logs, then another new column (lingrow1) with the outlier removed.I made a new data frame (operc1) with rows removed if lingrow1 == NA, since the model fitting functions won't work with missing data. I had to use glm() because the design is unbalanced. My results are as follows.
> operc[1:10,] date site fish flen fage yrclass inage ingrow ygrow lingrow lingrow1 1 sep94 downff 503 39.7 5 89 1 NA 89 NA NA 2 sep94 downff 503 39.7 5 89 2 NA 90 NA NA 3 sep94 downff 503 39.7 5 89 3 6.8811 91 1.928779 1.928779 4 sep94 downff 503 39.7 5 89 4 4.8336 92 1.575592 1.575592 5 sep94 downff 503 39.7 5 89 5 3.2650 93 1.183260 1.183260 6 sep94 downff 505 46.2 10 84 1 NA 84 NA NA 7 sep94 downff 505 46.2 10 84 2 NA 85 NA NA 8 sep94 downff 505 46.2 10 84 3 3.1331 86 1.142023 1.142023 9 sep94 downff 505 46.2 10 84 4 5.4613 87 1.697687 1.697687 10 sep94 downff 505 46.2 10 84 5 5.9248 88 1.779147 1.779147 > operc1<-operc[!is.na(lingrow1),] > operc1[1:10,] date site fish flen fage yrclass inage ingrow ygrow lingrow lingrow1 3 sep94 downff 503 39.7 5 89 3 6.8811 91 1.9287785 1.9287785 4 sep94 downff 503 39.7 5 89 4 4.8336 92 1.5755915 1.5755915 5 sep94 downff 503 39.7 5 89 5 3.2650 93 1.1832598 1.1832598 8 sep94 downff 505 46.2 10 84 3 3.1331 86 1.1420229 1.1420229 9 sep94 downff 505 46.2 10 84 4 5.4613 87 1.6976869 1.6976869 10 sep94 downff 505 46.2 10 84 5 5.9248 88 1.7791469 1.7791469 11 sep94 downff 505 46.2 10 84 6 4.5335 89 1.5114943 1.5114943 12 sep94 downff 505 46.2 10 84 7 1.1277 90 0.1201802 0.1201802 13 sep94 downff 505 46.2 10 84 8 1.5824 91 0.4589427 0.4589427 14 sep94 downff 505 46.2 10 84 9 1.2797 92 0.2466257 0.2466257 > operc.fit<-glm(lingrow1~inage+ygrow+site,data=operc1) > anova(operc.fit,test="F") Analysis of Deviance Table Gaussian model Response: lingrow1 Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev F Value Pr(F) NULL 1636 1027.621 inage 17 781.5660 1619 246.055 307.1458 0.0000000 ygrow 18 3.7770 1601 242.278 1.4018 0.1205401 site 2 2.9352 1599 239.343 9.8047 0.0000586 > operc.fit.1<-update(operc.fit,~.+inage:site) > anova(operc.fit.1,test="F") Analysis of Deviance Table Gaussian model Response: lingrow1 Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev F Value Pr(F) NULL 1636 1027.621 inage 17 781.5660 1619 246.055 310.5557 0.0000000 ygrow 18 3.7770 1601 242.278 1.4174 0.1133496 site 2 2.9352 1599 239.343 9.9135 0.0000527 inage:site 32 7.3652 1567 231.978 1.5547 0.0251957 >
Using glm() to analyse an unbalanced design, it makes a difference what order the variables are added to the model. Try adding the terms in a different order and see how different the ANOVA is.