10.4 - Kernel Functions

Handling the nonlinear transformation of input data into higher dimension may not be easy. There may be many options available, to begin with, and the procedures may be computationally heavy also. To avoid some of those problems, the concept of Kernel functions is introduced.

It so happens that in solving the quadratic optimization problem of the linear SVM, the training data points contribute through inner products of nonlinear transformations. The inner product of two n-dimensional vectors is defined as

\(\sum_{j=1}^{n} x_{1j} x_{2j} \)

Where \(X_1 = (x_{11}, x_{12}, \cdots x_{1n}) \) and \(X_2 = (x_{21}, x_{22},… x_{2n})\). The kernel function is a generalization of the inner product of nonlinear transformation and is denoted by K(X1, X2). Anywhere such an inner product appears, it is replaced by the kernel function. In this way, all calculations are made in the original input space, which is lower dimensionality. Some of the common kernels are a polynomial kernel, sigmoid kernel, and Gaussian radial basis function. Each of these will result in a different nonlinear classifier in the original input space. There is no golden rule to determine which kernel will provide the most accurate result in a given situation. In practice, the accuracy of SVM does not depend on the choice of the kernel.