WQD.4 - Applying Tree-Based Methods

Printer-friendly versionPrinter-friendly version

Sample R code for Tree-based Models and Random Forest

The response variable quality is assumed to be an ordinal variable, not a continuous variable. It has been noted before that proportions in too low (4 or less) or too high (8 or above) categories are small.

category classification table

Hence wines are classified into three categories by combining 3, 4, and 5 into one category (Low), 6 (Medium) and 7, 8 and 9 into another (High).

The following regression tree is obtained:

R output

tree-based analysis plot

 

Applying the procedure on Test data, the following mis-classification table is obtained:

  Quality Classification
Test Data Low Medium High
Low 371 277 38
Medium 214 495 251
High 19 167 205
Accuracy (371 + 495 + 205) / 2037 = 50%