A Note About Sparse Optimization

2019-11-26T00:00:00-08:00

For an under-determined system of linear equations, there exists infinitely many solutions. However, with more information, for example, if we know the the optimal solution is sparse, we can recover the desired solution.

There are also cases where we are interested in the sparse solution. For such problems we generally introduce a regularizer, to force the solution to be sparse:

$ \min_{\theta} ~ $ loss$ (\theta, $ data) $ + ~ \lambda $ regularizer $ (\theta) .$

One of the most popular choice for a regularizer is the $ \ell_1 $ norm regularization (when $ \theta $ is a vector) or nuclear norm regularization (when $ \theta $ is a matrix). I used to think that $ \ell_1 $ regularization makes sense because it is the tightest convex-relaxation to the corresponding objective function with $ \ell_0 $ function regularization (which makes the problem non-convex). However, in a talk by Venkat Chandrasekaran, I came to know that I was essentially wrong. The tightest convex relaxation of objective with the $ \ell_0 $ function regularization need not be the $ \ell_1 $ regularization. Suppose the problem at hand was:

$ \min_{\theta} ~ $ loss$ (\theta, $ data) $ + ~ \lambda \ell_0 (\theta) .$

Let the optimum value of this (non-convex) optimization problem be $ OPT $. Suppose the tightest convex relaxation of this problem is:

$ \min_{\theta} ~ $ loss$ (\theta, $ data) $ + ~ \lambda \ell_c (\theta) .$

Let the optimum value of this optimization problem be $ OPT_c $. The solution returned by this convex optimization need not be the sparsest. In fact, it could be the case that the sparsest solution is returned by this optimization problem:

$ \min_{\theta} ~ $ loss$ (\theta, $ data) $ + ~ \lambda \ell_1 (\theta) .$

Let the optimum value of this objective function be $ OPT_1 $.

The above scenario could be be true because convex relaxation guarantees nothing about sparsity of the solution. It only tells that $ OPT $ is closer to $ OPT_c $ than $ OPT_1 $. Turns out, in many cases, $ \ell_1 $-regularization forces the solution to be on the “low-dimensional” face of the polytope formed by the “atoms” (see Convex Geometry of Linear Inverse Problems).

I also attended a talk by Dr Arthur Mensch, where he showed an example (in the section on Smoothed max operators) where $ \ell_2^2 $ regularization lead to the sparsest solution!

Looking forward to an interesting talk happening on December 3 by Stephane Chretien happening at INRIA Lille, on a new and simpler analysis of robust PCA using the descent cone approach mentioned/developed in the Amelunxen, Lotz, McCoy, and Tropp. Living on the edge… paper.

ICTS Summer School on Advanced in Applied Probability

2019-08-16T00:00:00-07:00

I have been attending an awesome summer school at ICTS on Advances in Applied Probability at ICTS. Find the link here! There are many interesting things I have learnt here. Things that were related to my interests were the: optimal transport course by Dr Jose Blanchet, non-parametric matrix estimation by Dr Devarat Shah, and a talk on Gaussian mean estimation by Dr Praneeth Netrapalli. Find the videos for these talks on YouTube.

Apoorv

A Note About Sparse Optimization

ICTS Summer School on Advanced in Applied Probability