## On Euclidean k-means with α-Center Proximity

Published in *AISTATS*, 2019

$k$-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal $k$-means clusters are *stable* under additive or multiplicative perturbation of data. This has two caveats. First, we do not know how to efficiently verify this property of optimal solutions that are NP-hard to compute in the first place. Second, the stability assumptions required for polynomial time $k$-means algorithms are often unreasonable when compared to the ground-truth clusters in real-world data. A consequence of multiplicative perturbation resilience is *center proximity*, that is, every point is closer to the center of its own cluster than the center of any other cluster, by some multiplicative factor $ \alpha > 1 $.

We study the problem of minimizing the Euclidean $k$-means objective only over clusterings that satisfy $ \alpha $-center proximity. We give a simple algorithm to find the optimal $ \alpha $-center-proximal $k$-means clustering in running time exponential in k and $ 1/(\alpha−1) $ but linear in the number of points and the dimension. We define an analogous $ \alpha $-center proximity condition for outliers, and give similar algorithmic guarantees for $k$-means with outliers and $ \alpha $-center proximity. On the hardness side we show that for any $ \alpha’ > 1 $, there exists an $ \alpha \leq \alpha’ $, $ (\alpha > 1) $, and an $ \varepsilon_0>0 $ such that minimizing the $k$-means objective over clusterings that satisfy $ \alpha $-center proximity is NP-hard to approximate within a multiplicative $(1+\varepsilon_0)$ factor. Find the full paper here.

Recommended citation: Amit Deshpande, Anand Louis, Apoorv Singh ; Proceedings of Machine Learning Research, PMLR 89:2087-2095, 2019 __http://proceedings.mlr.press/v89/deshpande19a.html__