# Lesson 11: Tree-based Methods

Printer-friendly version

### Introduction

 Key Learning Goals for this Lesson: understand the basic idea of decision trees. understand the three elements in the construction of a classification tree. understand the definition of the impurity function and several example functions. know how to estimate the posterior probabilities of classes in each tree node. understand the advantages of tree-structured classification methods. understand the resubstitution error rate and the cost-complexity measure, their differences, and why the cost-complexity measure is introduced. understand weakest-link pruning. understand the fact that the best pruned subtrees are nested and can be obtained recursively. understand the method based on cross-validation for choosing the complexity parameter and the final subtree. understand the purpose of model averaging. understand the bagging procedure. understand the random forest procedure. understand the boosting approach. Textbook reading: Chapter 8: Tree-Based Methods.

Decision trees can be used for both regression and classification problems. Here we focus on classification trees. Classification trees are a very different approach to classification than prototype methods such as k-nearest neighbors. The basic idea of these methods is to partition the space and identify some representative centroids.

They also differ from linear methods, e.g., linear discriminant analysis, quadratic discriminant analysis and logistic regression. These methods use hyperplanes as classification boundaries.

Classification trees are a hierarchical way of partitioning the space. We start with the entire space and recursively divide it into smaller regions. At the end, every region is assigned with a class label.

#### Tree Structured Classifier

The following textbook presents Classification and Regression Trees (CART) :

Reference: Classification and Regression Trees by L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Chapman & Hall, 1984.