11.6 - Variable Combinations

So far, we have assumed that the classification tree only partitions the space by hyperplanes parallel to the coordinate planes. In the two-dimensional case, we only divide the space either by horizontal or vertical lines. How much do we suffer by such restrictive partitions?

Let's take a look at this example...

In the example below, we might want to make a split using the dotted diagonal line which separates the two classes well. Splits parallel to the coordinate axes seem inefficient for this data set. Many steps of splits are needed to approximate the result generated by one split using a sloped line.


There are classification tree extensions which, instead of thresholding individual variables, perform LDA for every node.

Or we could use more complicated questions. For instance, questions that use linear combinations of variables:

\(\sum a_j x_{.j} \le c?\)

This would increase the amount of computation significantly. Research seems to suggest that using more flexible questions often does not lead to obviously better classification result, if not worse. Overfitting is more likely to occur with more flexible splitting questions. It seems that using the right sized tree is more important than performing good splits at individual nodes.