10.2 - Support Vector Classifier

The maximal margin classifier is a very natural way to perform classification, is a separating hyperplane exists. However the existence of such a hyperplane may not be guaranteed, or even if it exists, the data is noisy so that maximal margin classifier provides a poor solution. In such cases, the concept can be extended where a hyperplane exists which almost separates the classes, using what is known as a soft margin. The generalization of the maximal margin classifier to the non-separable case is known as the support vector classifier, where a small proportion of the training sample is allowed to cross the margins or even the separating hyperplane. Rather than looking for the largest possible margin so that every observation is on the correct side of the margin, thereby making the margins very narrow or non-existent, some observations are allowed to be on the incorrect side of the margins. The margin is soft as a small number of observations violate the margin. The softness is controlled by slack variables which control the position of the observations relative to the margins and separating hyperplane. The support vector classifier maximizes a soft margin. The optimization problem can be modified as

\( y_i (\theta_0 + \theta_1 x_{1i} + \theta_2 x_{2i} + \cdots + \theta_n x_{ni}) \ge  1 – \epsilon_i \text{ for every observation}\)

\( \text{Where} \;\; \epsilon_i \ge 0 \;\; \text{and} \sum_{i=1}^{n}\epsilon_i \le C \)

The εi is the slack corresponding to \(i^{th}\) observation and C is a regularization parameter set by the user. The larger value of C leads to a larger penalty for errors.

However, there will be situations when a linear boundary simply does not work.