12.4 - Initialization

We bypassed this issue earlier. We had an example where we showed that k-means can result in different solutions for different initializations. The problem is that we do not have any theoretical guidance on how to pick the initial prototypes. Many times researchers don't know whether the initialization is a good one until they have completed the k-means algorithm and have the results in hand. Different ways have been proposed to initialize the k-means algorithm.

The simplest scheme is just to randomly pick some data points from the training data set and treat these as the initial prototypes. There are some other more sophisticated ways for initialization. Sometimes, people try different initializations and pick one that results in the minimum value for the objective function.

There are also methods that randomly pick the prototypes using another method to start the k-means iteration. For example, one may just pick a set of initial prototypes as random samples of a certain distribution.

Initialization in the above simulation:

  • Generated M random vectors with independent dimensions. For each dimension, the feature is uniformly distributed in [-1, 1].

For instance, suppose the dimension is p and the number of prototypes is M. Randomly pick a point in [-1,1] and do this p times to obtain a p-dimensional vector. If the random number generator generates a value in the range [0,1] you can simply multiply the value by 2 and subtract 1 to achieve a uniform distribution on [-1,1].

– Linearly transform the jth variable, \(Z_j , j = 1, 2, \cdots , p\) in each prototype (a vector) by: \(Z_js_j + m_j\), where \(s_j\) is the sample standard deviation of dimension j and \(m_j\) is the sample mean of dimension j, both computed using the training data.

During a k-means iteration, it may happen that a prototype has no data point assigned to it. We call this an “empty” prototype. When this happens, we can remove this prototype to reduce the number of prototypes by 1. If we don’t want to reduce the number of prototypes, we can further divide points in a non-empty prototype arbitrarily into two groups and compute the means for the two groups. The two means will serve as two prototypes to replace the previous prototype.