# CD.5: Decision Tree

CD.5: Decision Tree

Sample R code for Tree Based Algorithms

TXTrain1$TxClassLib <- as.factor(TXTrain1$TxClassLib)
TXTrainTree1 <- tree(TxClassLib ~ ., data=TXTrain1, method="class")
plot(TXTrainTree1, col="dark red")
text(TXTrainTree1, pretty=0, cex=0.6, col="dark red")
mtext("Decision Tree (Unpruned) for Training Set 1", side=3, line = 2, cex=0.8, col="dark red")
m <- misclass.tree(TXTrainTree1)
propmisTrain1 <- m / length(TXTrainTree1$y) cat("Proportion of Misclassification in Training Set 1:", propmisTrain1) TXTest1Treefit1 <- predict(TXTrainTree1, TXTest1, type="class") Tab1 <- table(TXTest1Treefit1, TXTest1$TxClassLib)
propmisTest1 <- 1-tr(Tab1)/length(TXTest1Treefit1)
cat("Proportion of Misclassification in Test Set 1 =", propmisTest1)
TXTrainPruneTree1 <- prune.misclass(TXTrainTree1, best=20)
m <- misclass.tree(TXTrainPruneTree1)
m / length(TXTrainPruneTree1$y) plot(TXTrainPruneTree1, col="dark red") text(TXTrainPruneTree1, pretty=0, cex=0.6, col="dark red") mtext("Decision Tree for Training Set 1", side=3, line = 2, cex=0.8, col="dark red") TXTest1PruneTreefit1 <- predict(TXTrainPruneTree1, TXTest1, type="class") Tab1 <- table(TXTest1PruneTreefit1, TXTest1$TxClassLib)
propmisTest1 <- 1-tr(Tab1)/length(TXTest1PruneTreefit1)
cat("Proportion of Misclassification in Test Set 1 =", propmisTest1)

################### Random Forest ###################

TXTrainRF1 <- randomForest(TXTrain1[,1:8],TXTrain1[,9], ntree=100, importance=T, proximity=T)
# TXTrainRF1 <- randomForest(TXTrain1[,1:8],TXTrain1[,9], xtest=TXTest1[,1:8], ytest=TXTest1[,9], ntree=100, importance=T, proximity=T)
plot(TXTrainRF1, main="OOB Error Rate: Set 1", cex=0.4)
TXTrainRF1
varImpPlot(TXTrainRF1,  pch=19, col="dark red", main="Variable Importance: Set 1", cex=0.8)



Unsupervised tree algorithm is applied to all Training sets and misclassification probability was calculated for both the Training and Test sets. All the Training Sets give rise to very similar decision trees. Three representative trees are shown below as examples.

Following table summarizes the misclassification probabilities for Tree classification

Therefore overall mis-classification probability of the 10-fold cross-validation is 17.9%, which is the mean mis-classification probability of the Test sets.

Pruning was tried for this decision tree, but it did not improve the result.

At the first glance this high error rate compared to k-NN and LDA looks surprising. LDA uses only linear classifier all over the sample space, but Tree procedure recursively partitions the sample space to reduce mis-classification error. It is therefore expected that Tree procedure will always give better results than LDA.

However, it is to be noted that LDA takes into account linear combinations of the predictors, whereas Tree always divides the sample space into splits parallel to the axes. If separation is along any other line, Tree wil not be able to capture that. This is exactly what is happening here.

 [1] Link ↥ Has Tooltip/Popover Toggleable Visibility