WQD.4 - Applying Tree-Based Methods

Sample R code for Tree-based Models and Random Forest
The response variable quality is assumed to be an ordinal variable, not a continuous variable. It has been noted before that proportions in too low (4 or less) or too high (8 or above) categories are small.
Hence wines are classified into three categories by combining 3, 4, and 5 into one category (Low), 6 (Medium) and 7, 8 and 9 into another (High).
The following regression tree is obtained:
Applying the procedure on Test data, the following mis-classification table is obtained:
Quality Classification | |||
Test Data | Low | Medium | High |
Low | 371 | 277 | 38 |
Medium | 214 | 495 | 251 |
High | 19 | 167 | 205 |
Accuracy | (371 + 495 + 205) / 2037 = 50% |