Digital binary encrypted code matrix

Annual Graduate School in Mathematical Aspects of Data Science

The first Global School Series in Mathematical Aspects of Data Science, hosted by the Mathematical Sciences Institute (MSI), Australian National University (ANU) at the Darwin Convention Centre. Intended for graduate level students and centred on the mathematical foundations of data science.

schedule Date & time
Date/time
Mon, Jun 24 2024, 8am - Fri, Jun 28 2024, 12pm
contact_support Contact
Tony Martin
Administrative Coordinator (Marketing, Conferences and Events), Mathematical Sciences Institute, Australian National University

About

The first Global School Series in Mathematical Aspects of Data Science will be held from Mon, Jun 24 2024 to Fri, Jun 28 2024 at the Darwin Convention Centre, in Darwin (Northern Territory, Australia. The aim of the series is to expose graduate students and early-career researchers to the beautiful and exciting mathematical theory that is needed in modern data science research.

The realisation of this Winter School is facilitated via a generous donation by Google Research Australia. The Google donation will be used to assist the participation of staff and students to our inaugural Winter School. Please register your participation as soon as possible to avoid disappointment, since the number of places is limited. 

This year, the school will consist of short courses presented by:

Peter Bartlett (University of California Berkeley and Google DeepMind)
Subhro Ghosh (National University of Singapore)
Gabor Lugosi (Pompeu Fabra University Barcelona)
Shahar Mendelson (Australian National University)
Jonathan Scarlett (National University of Singapore)
Nikita Zhivotovskiy (University of California Berkeley)

Scientific Committee

Gabor Lugosi (Pompeu Fabra University Barcelona), Shahar Mendelson (Australian National University), Roman Vershynin (University of California Irvine), Peter Bartlett (University of California Berkeley and Google Research)

Venue

Darwin Convention Centre

10 Stokes Hill Rd
Darwin City NT 0800
Australia

-12.467162582782, 130.84870219231

Sponsors

Courses

Topics in statistical learning theory

Peter Bartlett

We review tools for the analysis of the generalization performance of prediction methods on classification and regression problems. We review uniform convergence properties and associated notions of complexity, such as Rademacher averages, covering numbers, and combinatorial dimensions. We also review the analysis of the performance of nonparametric estimation methods such as nearest-neighbor rules and kernel smoothing. And we review results on the performance of interpolating prediction rules.

 

Learning under latent symmetries

Subhro Ghosh

Machine learning under the constraint of symmetries, given by group invariances or equivariances, has emerged as a topic of active interest in recent years. Natural settings for such applications include the multi-reference alignment and cryo electron microscopy (cryo EM), multi-object tracking, spherical images, and so on.

We will start with a gentle introduction to the problem of learning under latent symmetries, focussing on the model of multi reference alignment (MRA), and more generally, the discrete orbit recovery model. Despite significant interest, a clear picture for understanding rates of estimation in MRA has emerged only recently, especially in the practically important regime of high ambient noise (sigma >> 1), where the best known results exhibit an asymptotic sample complexity of order sigma^6, whereas in the absence of latent symmetries it is known to be of order sigma^2.

In recent work, we investigate this problem for sparse signals, where we unveil a remarkable intermediate sample complexity of order sigma^4. Our techniques exploit connections to classical topics, such as crystallographic phase retrieval, the beltway problem from combinatorial optimization, and uniform uncertainty principles from harmonic analysis. In another direction, we will explore recently emerged tools based on group representation theory to study learning problems with inherent symmetries. This includes applications to classical questions such as dictionary learning in the presence of invariances and entails a new kind of intrinsic coordinate system for symmetry-augmented learning problems based on group characters.
Ref:
===
[1] Sparse Multi-Reference Alignment: Phase Retrieval, Uniform Uncertainty Principles and the Beltway Problem, S. Ghosh and P. Rigollet,
Foundations of Computational Math. (2022).

https://doi.org/10.1007/s10208-022-09584-6
[2] Dictionary Learning under Symmetries via Group Representations. S Ghosh, AYR Low, YS Soh, Z Feng and BKY Tan, arXiv preprint arXiv:2305.19557.

 

Concentration inequalities

Gabor Lugosi

In these lectures we review some techniques to prove concentration inequalities for general functions of independent random variables. Starting with inequalities for sums of independent random variables and the Efron-Stein inequality, we review the basic ideas of the so-called entropy method, with the Gaussian concentration inequality as a basic example. We also discuss the relationship of concentration of measure with isoperimetric problems and present Talagrand's "convex distance" inequality. We present applications to the suprema of empirical processes and threshold phenomena.

 

The small-ball method and (some of) its applications.

Shahar Mendelson

The small-ball method allows one to derive nontrivial lower bounds on the infimum of certain random processes involving nonnegative random variables with potentially heavy tails (that is, their distribution functions decay slowly). Realistic problems often involve heavy-tailed random variables, and prior to the introduction of the small-ball method such problems were completely out of reach of existing theory (for example, because two-sided concentration is simply false in such situations).

Since its discovery, the small-ball method has been the key ingredient in the solution to many open problems in diverse areas, ranging from Pure Mathematics (e.g., Convex Geometry, Random Matrix Theory) to Statistics and Signal Processing. I will present the main ideas of the method and a few (relatively simple) applications. If time permits I will also discuss directions in which the method can be extended.

 

An Introduction to Statistical Lower Bounds for Estimation and Learning

Jon Scarlett

A key component of the study of estimation and learning problems is sample complexity lower bounds, which provide algorithm-independent hardness results dictating that no procedure can attain a certain success criterion if the number of data samples falls below a certain threshold.  Such results can be used to certify the near-optimality of algorithmic upper bounds, or to help identify where the greatest areas for improvement still remain.  In this tutorial-style talk, I will present a variety of commonly-used lower bounding techniques.  In the first part, I will provide a detailed overview of information theory techniques such as Fano’s inequality.  In the second part, I will briefly summarise alternatives such as Le Cam’s method, Assouad’s method, change-of-measure techniques, and reductions based on communication complexity.  Throughout the talk, various example uses of these methods will be given in areas such as mean estimation, group testing, graph learning, convex optimisation, and multi-armed bandits.

 

The Many Faces of Exponential Weighting

Nikita Zhivotovskiy

The Exponential Weights (EW) algorithm is a versatile tool with broad utility across statistics and machine learning. Originally developed in information theory, EW has an elegant theoretical framework that tackles many problems. First, we will see how EW can estimate discrete distributions. Next, we will examine its role in ensemble methods for aggregating multiple estimators. Finally, we will discuss EW in logistic and linear regression, where it enables regularization through an online learning approach.

Schedule

event

schedule09:00 - 11:00

Gabor Lugosi

schedule11:30 - 12:30

Nikita Zhivotovskiy

schedule14:00 - 16:00

Peter Bartlett

schedule16:30 - 17:30

Jonathan Scarlett

event

schedule09:00 - 11:00

Peter Bartlett

schedule11:30 - 12:30

Shahar Mendelson

schedule14:00 - 16:00

Jonathan Scarlett

schedule16:30 - 17:30

Subhro Ghosh

event

schedule09:00 - 11:00

Nikita Zhivotovskiy

schedule11:30 - 12:30

Jonathan Scarlett

schedule14:00 - 16:00

Subhro Ghosh

event

schedule09:00 - 11:00

Shahar Mendelson

schedule11:30 - 12:30

Nikita Zhivotovskiy

schedule14:00 - 16:00

Gabor Lugosi

schedule16:30 - 17:30

Subhro Ghosh

event

schedule08:30 - 10:30

Gabor Lugosi

schedule10:45 - 11:45

Shahar Mendelson

Accommodation

Accommodation

BEWARE OF SCAMS: do not trust any solicitations to book your accommodation. We have not appointed external organisations or travel agents to ‘finalise a reservation’. Please  contact directly the organisers if you have concerns about a message you received about bookings.

The venue is located within Darwin’s Waterfront Precinct, amongst many hotels, cafes, bars and restaurants in close proximity.  Competitive rates have been secured with the following hotels for conference participants:

Alternative 3 to 3.5-star accommodation options:

  • The Cavenagh Hotel (4 min walk)
  • MOM Hostel - Darwin YHA (6 min walk)
  • Capitanos (7 min walk)

Further options available at Top End Northern Territory.

There is a free shuttle bus that services the waterfront precinct at lunch time and dinner time.

Please note that rooms and rates are subject to availability at the time of booking.

June is the high season within the Northern Territory, we recommend you book your flights, accommodation, and tours as early as possible to take advantage of the competitive rates.

Image
Aerial view of a port in Darwin, Australia

Tourism

Tourism 

Explore the Northern Territory before or after the conference with Tourism Top End.