SVM is quite intuitive when the data is linearly separable. However, when they are not, as shown in the diagram below, SVM can be extended to perform well.
There are two main steps for nonlinear generalization of SVM. The first step involves the transformation of the original training (input) data into a higher dimensional data using a nonlinear mapping. Once the data is transformed into the new higher dimension, the second step involves finding a linear separating hyperplane in the new space. The maximal marginal hyperplane found in the new space corresponds to a nonlinear separating hypersurface in the original space.
Example: Feature Expansion Section
Suppose the original feature space includes two variables \(X_1\) and \(X_2\). Using polynomial transformation the space is expanded to (\(X_1, X_2, X_1^2, X_2^2, X_1X_2\)). Then the hyperplane would be of the form
\(\theta_0 + \theta_1 X_1 + \theta_2 X_2 + \theta_3 X_1^2 + \theta_4 X_2^2 + \theta_5 X_1 X_2 = 0\)
This will lead to nonlinear decision boundaries in the original feature space. If upto second degree terms are considered, 2 features are expanded to 5. If upto third degree terms are considered the same to features can be expanded to 9 features. The support vector classifier in the expanded space solves the problems in the lower dimension space.