The aim of the series is to expose graduate students and early-career researchers to the beautiful and exciting mathematical theory that is needed in modern data science research.

Courses

 

Topics in statistical learning theory

Peter Bartlett

We review tools for the analysis of the generalization performance of prediction methods on classification and regression problems. We review uniform convergence properties and associated notions of complexity, such as Rademacher averages, covering numbers, and combinatorial dimensions. We also review the analysis of the performance of nonparametric estimation methods such as nearest-neighbor rules and kernel smoothing. And we review results on the performance of interpolating prediction rules.

 

Learning under latent symmetries

Subhro Ghosh

Machine learning under the constraint of symmetries, given by group invariances or equivariances, has emerged as a topic of active interest in recent years. Natural settings for such applications include the multi-reference alignment and cryo electron microscopy (cryo EM), multi-object tracking, spherical images, and so on.

We will start with a gentle introduction to the problem of learning under latent symmetries, focussing on the model of multi reference alignment (MRA), and more generally, the discrete orbit recovery model. Despite significant interest, a clear picture for understanding rates of estimation in MRA has emerged only recently, especially in the practically important regime of high ambient noise (sigma >> 1), where the best known results exhibit an asymptotic sample complexity of order sigma^6, whereas in the absence of latent symmetries it is known to be of order sigma^2.

In recent work, we investigate this problem for sparse signals, where we unveil a remarkable intermediate sample complexity of order sigma^4. Our techniques exploit connections to classical topics, such as crystallographic phase retrieval, the beltway problem from combinatorial optimization, and uniform uncertainty principles from harmonic analysis. In another direction, we will explore recently emerged tools based on group representation theory to study learning problems with inherent symmetries. This includes applications to classical questions such as dictionary learning in the presence of invariances and entails a new kind of intrinsic coordinate system for symmetry-augmented learning problems based on group characters.
Ref:
===
[1] Sparse Multi-Reference Alignment: Phase Retrieval, Uniform Uncertainty Principles and the Beltway Problem, S. Ghosh and P. Rigollet,
Foundations of Computational Math. (2022).
https://doi.org/10.1007/s10208-022-09584-6
[2] Dictionary Learning under Symmetries via Group Representations. S Ghosh, AYR Low, YS Soh, Z Feng and BKY Tan, arXiv preprint arXiv:2305.19557.

 

Concentration inequalities

Gabor Lugosi

In these lectures we review some techniques to prove concentration inequalities for general functions of independent random variables. Starting with inequalities for sums of independent random variables and the Efron-Stein inequality, we review the basic ideas of the so-called entropy method, with the Gaussian concentration inequality as a basic example. We also discuss the relationship of concentration of measure with isoperimetric problems and present Talagrand's "convex distance" inequality. We present applications to the suprema of empirical processes and threshold phenomena.

 

The small-ball method and (some of) its applications

Shahar Mendelson

The small-ball method allows one to derive nontrivial lower bounds on the infimum of certain random processes involving nonnegative random variables with potentially heavy tails (that is, their distribution functions decay slowly). Realistic problems often involve heavy-tailed random variables, and prior to the introduction of the small-ball method such problems were completely out of reach of existing theory (for example, because two-sided concentration is simply false in such situations).

Since its discovery, the small-ball method has been the key ingredient in the solution to many open problems in diverse areas, ranging from Pure Mathematics (e.g., Convex Geometry, Random Matrix Theory) to Statistics and Signal Processing. I will present the main ideas of the method and a few (relatively simple) applications. If time permits I will also discuss directions in which the method can be extended.

 

An Introduction to Statistical Lower Bounds for Estimation and Learning

Jon Scarlett

A key component of the study of estimation and learning problems is sample complexity lower bounds, which provide algorithm-independent hardness results dictating that no procedure can attain a certain success criterion if the number of data samples falls below a certain threshold.  Such results can be used to certify the near-optimality of algorithmic upper bounds, or to help identify where the greatest areas for improvement still remain.  In this tutorial-style talk, I will present a variety of commonly-used lower bounding techniques.  In the first part, I will provide a detailed overview of information theory techniques such as Fano’s inequality.  In the second part, I will briefly summarise alternatives such as Le Cam’s method, Assouad’s method, change-of-measure techniques, and reductions based on communication complexity.  Throughout the talk, various example uses of these methods will be given in areas such as mean estimation, group testing, graph learning, convex optimisation, and multi-armed bandits.

 

The Many Faces of Exponential Weighting

Nikita Zhivotovskiy

The Exponential Weights (EW) algorithm is a versatile tool with broad utility across statistics and machine learning. Originally developed in information theory, EW has an elegant theoretical framework that tackles many problems. First, we will see how EW can estimate discrete distributions. Next, we will examine its role in ensemble methods for aggregating multiple estimators. Finally, we will discuss EW in logistic and linear regression, where it enables regularization through an online learning approach.