Clustering Perturbation Resilient Instances


Gave a talk at the IIIT Bangalore Theory Club on our work on clustering with center proximity and min-max k-means problem. Link to the slides.

Abstract: Euclidean k-means clustering, a problem having numerous applications, is NP-hard in the worst case but often solved efficiently in practice using simple heuristics. A quest for understanding the properties of real-world data sets that allow efficient clustering has lead to the notion of the perturbation resilience and center proximity. In the first part of the talk, we’ll see an algorithm to recover the optimal k-means clustering in perturbation resilient instances, and some lower bounds. In some cases, clustering with the k-means objective may result in a few clusters of very large cost and many clusters of small cost. This can be undesirable when we have a budget constraint on the cost of each cluster. Motivated by this, we study the “min-max k-means” clustering objective. In the second part of the talk, we’ll see approximation algorithms for the min-max k-means problem. This is a joint work with Dr Amit Deshpande and Dr Anand Louis.