Efficient Algorithms for Online Learning
Online Learning is a powerful framework for modeling many sequential-decsion problems, such as online portfolio optimization in finance, or online classification in machine learning. It is also an important theoretical machine learning paradigm that (as opposed to classical supervised statistical learning paradigm) interleaves between the training and testing phases of learning, and does not rely on a statistical model. While a great emphasis is usually being put on designing online algorithms with optimal convergence rates (regret), I tend to put emphasis (also) on the design of computationally-efficient algorithms that are scalable to large problems. Recently, I’m interested in showing that under more refined-models (e.g., combination of stochastic and adversarial) it is possible to obtain stronger guarantees with much more efficient algorithms.
Main results include:
- The first algorithm for Online Linear Optimization with an approximation oracle (these include many natural online variants of combinatorial problems such as TSP, Vertex-Cover and more) with only logarithmic oracle complexity, see NIPS17 paper.
- A result showing that in many cases of interest the computationally-efficient Online Gradient Descent algorithm achieves logarithmic regret on non-strongly convex sequences (previous such algorithms are much less scalable), see the recent manuscript.
- A result which analyzes the regret of the nonconvex online gradient ascent for Online PCA, showing that when the data follows a stochastic model subject to adversarial-perturbations this nonconvex optimization algorithm can provably minimize regret (greatly improving over previous convex optimization-based algorithms which are much less scalable), see the recent manuscript.
Projection-free Learning and Optimization
The computational bottleneck in scaling-up many machine learning optimization algorithms is the difficulty of computing orthogonal projections onto convex sets. A very popular example is that of low-rank matrix factorization problems, such as the highly popular matrix-completion problem which is at the heart of many digital-content recommendation systems. For such problems this projection amounts to computing very expensive matrix decompositions (e.g., full-rank singular-value decomposition). An alternative is to use so-called projection-free algorithms which are mostly based on the classical Conditional Gradient (CG) algrotihm (aka Frank-Wolfe), which apply much more efficient steps (e.g., only low-rank SVD in the context of low-rank matrix problems).
Main results include:
- The first linearly converging CG algorithm for optimization over polytopes (exponential improvement over previous results), and the first regret-optimal algorithm for online learning, see FOCS13 paper and SIOPT paper. An even more efficient algorithm with promising empirical performance is described in this NIPS16a paper.
- The first provably-faster CG algorithm for optimization with low-rank matrices, see NIPS16b paper.
- Faster analysis of CG (quadratic improvement) for strongly-convex sets, see ICML15a paper.
- Fast stochastic algorithms for low-rank and nonsmooth matrix problems, see here.
- Fast CG-type algorithms for several matrix-recovery problems, see here.
Efficient PCA Algorithms
Principal Component Analysis is one of the most celebrated and popular data-analysis procedures and as such, its efficient implementation is of significant interest. I have studied this fundamental problem in several computational settings of interest. Current results include:
- A general framework reducing top-eigenvector computation to solving only a short sequence of convex quadratic problems . This gives rise a stochastic algorithm for PCA which significantly improves over classical methods and is the fastest to date in a wide regime of parameters, see this manuscript and this ICML16 paper. Follow-up works which exhibit promising empirical performance consider the closely related problem of Canonical Correlation Analysis, see NIPS16c paper, and the use of coordinate-descent methods, see ALT18 paper.
- Communication-efficient distributed algorithms for stochastic estimation of principal components , see ICML17 paper.
- Time and memory efficient PCA computation in the streaming model , see SODA15 paper.
- Efficient PCA algorithms for adversarial online learning settings, see ICML15b paper, and the recent manuscript.
Fast Algorithms for Semidefinite Programming
Solving semidefinite programs is an important subprocedure in many state-of-the-art algorithms for ML and optimization. It is also notoriously difficult to solve in large scale. I have introduced the first sublinear-time algorithm for solving SDPs (see NIPS11 paper) which I later significantly improved in this Math. Prog. paper, and gives the current fastest running times in a wide regime of parameters. This result is further accelerated using my work on faster eigenvector computations, as described here (Sec. 7).
I have also developed a novel randomized smoothing technique for SDP which is particularly suitable for sparse data, as described in my dissertation (Sec. 4.7).