Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Pages

About me

Posts

A Note About Sparse Optimization

2 minute read

Published:

For an under-determined system of linear equations, there exists infinitely many solutions. However, with more information, for example, if we know the the optimal solution is sparse, we can recover the desired solution.

ICTS Summer School on Advanced in Applied Probability

less than 1 minute read

Published:

I have been attending an awesome summer school at ICTS on Advances in Applied Probability at ICTS. Find the link here! There are many interesting things I have learnt here. Things that were related to my interests were the: optimal transport course by Dr Jose Blanchet, non-parametric matrix estimation by Dr Devarat Shah, and a talk on Gaussian mean estimation by Dr Praneeth Netrapalli. Find the videos for these talks on YouTube.

portfolio

publications

On Euclidean k-means with α-Center Proximity

Published in AISTATS, 2019

$k$-means clustering is NP-hard in the worst case but previous work has shown efficient algorithms assuming the optimal $k$-means clusters are stable under additive or multiplicative perturbation of data. This has two caveats. First, we do not know how to efficiently verify this property of optimal solutions that are NP-hard to compute in the first place. Second, the stability assumptions required for polynomial time $k$-means algorithms are often unreasonable when compared to the ground-truth clusters in real-world data. A consequence of multiplicative perturbation resilience is center proximity, that is, every point is closer to the center of its own cluster than the center of any other cluster, by some multiplicative factor $ \alpha > 1 $.
We study the problem of minimizing the Euclidean $k$-means objective only over clusterings that satisfy $ \alpha $-center proximity. We give a simple algorithm to find the optimal $ \alpha $-center-proximal $k$-means clustering in running time exponential in k and $ 1/(\alpha−1) $ but linear in the number of points and the dimension. We define an analogous $ \alpha $-center proximity condition for outliers, and give similar algorithmic guarantees for $k$-means with outliers and $ \alpha $-center proximity. On the hardness side we show that for any $ \alpha’ > 1 $, there exists an $ \alpha \leq \alpha’ $, $ (\alpha > 1) $, and an $ \varepsilon_0>0 $ such that minimizing the $k$-means objective over clusterings that satisfy $ \alpha $-center proximity is NP-hard to approximate within a multiplicative $(1+\varepsilon_0)$ factor. Find the full paper here.

Recommended citation: Amit Deshpande, Anand Louis, Apoorv Singh ; Proceedings of Machine Learning Research, PMLR 89:2087-2095, 2019 http://proceedings.mlr.press/v89/deshpande19a.html

Approximation Algorithms for Cost-Balanced Clustering

Published in Preprint, 2019

Clustering points in the Euclidean space is a fundamental problem in the theory of algorithms and in unsupervised learning. Various clustering objectives to quantify the quality of clustering have been proposed and studied; the $k$-means and $k$-median clustering objective are the most popular ones. In some cases, the $k$-means or the $k$-median objective may result in a few clusters of very large cost and many clusters of extremely small cost. Even when the optimal clusters are balanced in size, some of them may have a huge variance. This is undesirable for quantization or when we have a budget constraint on the cost of each cluster. Motivated by this, we study the cost-balanced $k$-means and the cost-balanced $k$-median problem. For a $k$-clustering $O_1, \ldots, O_k$ of a given set of $n$ points $X \subset \mathbb R^d$, we define its cost-balanced $k$-means cost as:

$ \boxed{\mathcal K ({O_1, \ldots, O_k}) \stackrel{def}{=} \max_{l \in [k]} \sum_{x \in O_l} \left\lVert{x - \mu_l}\right\rVert^2, \qquad \textrm{where } \mu_l = \frac{1}{\left\lvert{O_l}\right\rvert} \sum_{x \in O_l} x ~} $

In other words, we want to minimize the cost of the heaviest cluster or balance the cost of each cluster. For any $\varepsilon > 0$, we give a randomized algorithm with running time $\mathcal O\big({2^{poly \big({k/\varepsilon}\big)} n d }\big)$ that gives a $(1+\varepsilon)$-approximation to the optimal cost-balanced $k$-means and the similarly defined optimal cost-balanced $k$-median clustering, using $k$ clusters, with a constant probability. We define a more general version of the $k$-median clustering and the cost-balanced $k$-median clustering, and we name them $\ell_p$ cost $k$-clustering and $\ell_p$ cost-balanced $k$-clustering, respectively. Given a black-box algorithm which gives a constant factor approximation to the $\ell_p$ cost $k$-clustering, we show a procedure that runs in time $poly(n,k,p)$ which gives a bi-criteria $\mathcal O\big({1/\varepsilon^{1/p}}\big)$-approximation to the optimal $\ell_p$ cost-balanced $k$-clustering, using $(1+\varepsilon)k$ clusters.

Recommended citation: Amit Deshpande, Anand Louis, Deval Patel, Apoorv Singh. Approximation Algorithm for Cost-Balanced Clustering. In Submission. https://www.dropbox.com/s/r5uwemki3zfvvyb/min_max_km.pdf?dl=0

Regularized Spectral Methods for Clustering Signed Networks

Published in Preprint, 2020

We study the problem of $k$-way clustering in signed graphs. Considerable attention in recent years has been devoted to analyzing and modeling signed graphs, where the affinity measure between nodes takes either positive or negative values. Recently, Cucuringu et al. [CDGT 2019] proposed a spectral method, namely SPONGE (Signed Positive over Negative Generalized Eigenproblem), which casts the clustering task as a generalized eigenvalue problem optimizing a suitably defined objective function. This approach is motivated by social balance theory, where the clustering task aims to decompose a given network into disjoint groups, such that individuals within the same group are connected by as many positive edges as possible, while individuals from different groups are mainly connected by negative edges. Through extensive numerical simulations, SPONGE was shown to achieve state-of-the-art empirical performance. On the theoretical front, [CDGT 2019] analyzed SPONGE and the popular Signed Laplacian method under the setting of a Signed Stochastic Block Model (SSBM), for $k=2$ equal-sized clusters, in the regime where the graph is moderately dense. In this work, we build on the results in [CDGT 2019] on two fronts for the normalized versions of SPONGE and the Signed Laplacian. Firstly, for both algorithms, we extend the theoretical analysis in [CDGT 2019] to the general setting of $k \geq 2$ unequal-sized clusters in the moderately dense regime. Secondly, we introduce regularized versions of both methods to handle sparse graphs – a regime where standard spectral methods underperform – and provide theoretical guarantees under the same SSBM model. To the best of our knowledge, regularized spectral methods have so far not been considered in the setting of clustering signed graphs. We complement our theoretical results with an extensive set of numerical experiments on synthetic data.

Recommended citation: Mihai Cucuringu, Apoorv Vikram Singh, Déborah Sulem, Hemant Tyagi. Regularized Spectral Methods for Clustering Signed Networks. In Submission. https://arxiv.org/abs/2011.01737

reading

talks

teaching

Teaching experience 1

Undergraduate course, University 1, Department, 2014

This is a description of a teaching experience. You can use markdown like any other post.

Teaching experience 2

Workshop, University 1, Department, 2015

This is a description of a teaching experience. You can use markdown like any other post.