Parallel optimisation algorithms for large-scale machine learning

The rise of machine learning has lead to an explosion in the demand for optimisation algorithms tailored for very large-scale problems. For instance, training deep neural networks requires solving nonlinear, nonconvex optimisation problems with potentially millions of variables. One approach for making algorithms more scalable is to use parallel computing, by optimising a subset of variables on each processor/machine. 

The aim of this project is to investigate state-of-the-art parallel optimisation algorithms for nonconvex problems. There are several directions that could be taken, depending on the student's preferences. On the computational side, this could involve implementing these algorithms on suitable test problems, such as empirical risk minimisation for deep neural networks, and understanding how to make them efficient in practice. Alternatively, more theoretical work could include understanding the theoretical properties of the algorithms, and investigating how to make them more general (e.g. by using more sophisticated optimisation procedures, such as second-order methods, or coping with inexact gradient information).

This project would suit students interested in numerical analysis, optimisation, machine learning, or high-performance computing.

Bibliography:

  1. L. Cannelli, F. Facchinei, V. Kungurtsev, G. Scutari, Asynchronous parallel algorithms for nonconvex optimization, Mathematical Programming, to appear (2019).
  2. F. Facchinei, G. Scutari, S. Sagratella, Parallel Selective Algorithms for Nonconvex Big Data Optimization, IEEE Transactions on Signal Processing, 63:7 (2015), pp. 1874-1889.
  3. B. Recht, C. Ré, S. Wright, F. Niu, Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent, Advances in Neural Information Processing Systems (NIPS), 24 (2011).
  4. P. Richtárik, M. Takáč, Parallel coordinate descent methods for big data optimization, Mathematical Programming, 156:1-2 (2016), pp. 433-484.