Analysis of Classification Data

image of different textures In this example of data mining for knowledge discovery we consider a classification problem with a large number of objects to be classified based on many attributes. A set of 40 characters or attributes are measured on 5500 items which belong to 11 different categories of varied textures. Textures include a grass lawn, pressed calf leather, handmade paper, cotton canvas, etc. All of the attributes are measured on a continuous scale. Data are obtained from (https://sci2s.ugr.es/keel/dataset.php?cod=72#sub2)

Objective of the Analysis

Pattern recognition and Classification of 5500 objects into 11 classes based on 40 attributes

Data Files for this case (right-click and "save as" ) :

Texture.csv - full dataset

TXTrain1.csv
TXTrain2.csv
TXTrain3.csv
TXTrain4.csv
TXTrain5.csv
TXTrain6.csv
TXTrain7.csv
TXTrain8.csv
TXTrain9.csv
TXTrain10.csv

TXTest1.csv
TXTest2.csv
TXTest3.csv
TXTest4.csv
TXTest5.csv
TXTest6.csv
TXTest7.csv
TXTest8.csv
TXTest9.csv
TXTest10.csv

Texture.zip - all data files above together in a .zip file for convenience

Overview of Classification Problem and Cross-Validation

Classification problem may be treated as a special type of regression problem where, based on the values of the predictors, each observation is placed into one and only one of the categories. Probability that the i^th object will be placed into one of the j categories is 1, for all i = 1, … n. Each object has a different probability to be placed into different classes and is put into the class which maximizes this probability.

Performance of a classification rule is measured through the mis-classification probability. Following techniques of classification are applied here

Linear Discriminant Analysis
K Nearest Neighbour
Classification Tree
Random Forest

Printer-friendly version

Analysis of Classification Data

Objective of the Analysis

Overview of Classification Problem and Cross-Validation

Navigation

Start Here!

Lessons

Resources