Logistic Regression for Classification

2002-04-06

> treedata
   obj A B C D E
1    1 y n y n y
2    1 n n y n y
3    1 y n y y y
4    1 n y n n n
5    1 y y n y n
6    1 n n y n y
7    1 n n n n n
8    1 n n n n n
9    0 n n n n y
10   0 y y n n y
11   0 n n y y y
12   0 y y y y n
13   0 n y n y n
14   0 n n n n y
15   0 n n n n y
16   0 n y n n y
17   0 n n n n y
18   0 n y n n y
19   0 n y n n y
20   0 n n n n y
 
> fittree <- glm(cbind(obj,1-obj)~A+B+C+D+E, data=treedata, family=binomial(link=logit))
 
> fittree1 <- step(fittree)
Start:  AIC= 23.7 
 cbind(obj, 1 - obj) ~ A + B + C + D + E 
 
       Df Deviance    AIC
<none>      11.703 23.703
- C     1   14.170 24.170
- A     1   14.208 24.208
- D     1   15.394 25.394
- B     1   15.437 25.437
- E     1   22.426 32.426
 
> anova(fittree1,test="Chisq")
Analysis of Deviance Table
 
Model: binomial, link: logit
 
Response: cbind(obj, 1 - obj)
 
Terms added sequentially (first to last)
 
 
     Df Deviance Resid. Df Resid. Dev P(>|Chi|)
NULL                    19    26.9205          
A     1   1.0949        18    25.8255    0.2954
B     1   2.2827        17    23.5429    0.1308
C     1   0.5149        16    23.0279    0.4730
D     1   0.6017        15    22.4263    0.4379
E     1  10.7236        14    11.7027    0.0011
 
> summary(fittree1)
 
Call:
glm(formula = cbind(obj, 1 - obj) ~ A + B + C + D + E, family = binomial(link = logit), 
    data = treedata)
 
Deviance Residuals: 
       Min          1Q      Median          3Q         Max  
-2.231e+00  -6.366e-01  -7.821e-05   1.997e-01   1.227e+00  
 
Coefficients:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   18.566     89.128   0.208    0.835
Ay            10.235     60.252   0.170    0.865
By           -18.112     89.116  -0.203    0.839
Cy             2.517      1.744   1.443    0.149
Dy           -10.805     60.250  -0.179    0.858
Ey           -20.060     89.118  -0.225    0.822
 
(Dispersion parameter for binomial family taken to be 1)
 
    Null deviance: 26.920  on 19  degrees of freedom
Residual deviance: 11.703  on 14  degrees of freedom
AIC: 23.703
 
Number of Fisher Scoring iterations: 10
 
> boxplot(split(invlogit(predict(fittree1)),treedata$obj),
ylab="Score",xlab="Objective")
 

Statistics 4P03/6P03