11.8.1 - Preliminaries for Pruning

First, we would grow the tree to a large size. Denote this maximum size by \(T_{max}\). Stopping criterion is not important here because as long as the tree is fairly big, it doesn't really matter when to stop. The overgrown tree will be pruned back eventually. There are a few ways of deciding when to stop:

  1. Keep going until all terminal nodes are pure (contain only one class).
  2. Keep going until the number of data in each terminal node is no greater than a certain threshold, say 5, or even 1.
  3. As long as the tree is sufficiently large, the size of the initial tree is not critical.

The key here is to make the initial tree sufficiently big before pruning back.


Now we need to introduce a notation... Let's take a look at the following definitions:

1. Descendant: a node \(t^{'}\) is a descendant of node t if there is a connected path down the tree leading from t to \(t{'}\) .

2. Ancestor: t is an ancestor of \(t^{'}\) if \(t^{'}\) is its descendant.

3. A branch \(T_t\) of T with root node \(t \in T\) consists of the node t and all descendants of t in T .

4. Pruning a branch \(T_t\) from a tree T consists of deleting from T all descendants of t , that is, cutting off all of \(T_t\) except its root node. The tree pruned this way will be denoted by \(T - T_t\) .

5. If \(T^{'}\) is gotten from T by successively pruning off branches, then \(T^{'}\) is called a pruned subtree of T and denoted by \(T^{'} < T\)

tree graph tree graph tree graph



Optimal Subtrees

Even for a moderately sized \(T_{max}\), there is an enormously large number of subtrees and an even larger number ways to prune the initial tree to get any. We, therefore, cannot exhaustively go through all the subtrees to find out which one is the best in some sense. Moreover, we typically do not have a separate test dataset to serve as a basis for selection.

A smarter method is necessary. A feasible method of pruning should ensure the following:

  • The subtree is optimal in a certain sense, and
  • The search of the optimal subtree should be computationally tractable.