The Data Science Lab

Spectral Data Clustering from Scratch Using C#

The demo method code is:

private int[] ProcessEmbedding(double[][] E)
{
  KMeans km = new KMeans(E, this.k);
  int[] clustering = km.Cluster();
  return clustering;
}

Almost too simple. The demo program defines a nested KMeans class inside the top-level Spectral class. An alternative design is to define the KMeans class externally to the Spectral class.

Wrapping Up
The ideas behind spectral clustering are based on graph theory and were mostly introduced in the 1960s. But using the ideas for machine learning data clustering is relatively new -- dating from roughly 2001. Spectral clustering is sometimes applied to image data when the technique is known as segmentation-based object categorization.

I suspect that there is one main reason why spectral data clustering is not used very often. The technique is based on deep mathematics and people who don't have a basic familiarity with similarity metrics, affinity matrices, Laplacian matrices, eigenvalues and eigenvectors, may shy away from using spectral clustering. This is especially true when there are much simpler clustering techniques available, notably k-means and DBSCAN, combined with the fact that there is no objective measure of how good any data clustering result is.


About the Author

Dr. James McCaffrey works for Microsoft Research in Redmond, Wash. He has worked on several Microsoft products including Azure and Bing. James can be reached at [email protected].

comments powered by Disqus

Featured

Subscribe on YouTube