#### Research Overview

Many areas of science, engineering, and industry are already being revolutionized by the adoption of tools and techniques from data science. However, a rigorous analysis of existing approaches together with the development of new ideas is necessary to a) ensure the optimal use of available computational and statistical resources and b) develop a principled and systematic approach to the relevant problems rather than relying on a collection of ad hoc solutions. In particular, there are many interrelated questions that arise in a typical data science project.- First is the acquisition of relevant data: Can data be collected interactively and might this reduce the costs of data acquisition? Is the data noisy and how might this impact the results?
- Second is the processing of data: If the data cannot fit in the memory of a single machine, how can we minimize the communication costs within a cluster of machines? When are approximate answers sufficient and how does the required accuracy trade off with the computational resources available?
- Third is the prediction value of the available data: Can the uncertainty of the final results be quantified? How can the modeling assumptions used by our algorithms be efficiently evaluated?

- Understanding the trade-off between rounds of interactive data acquisition and statistical and computational efficiency.
- Minimizing query complexity in interactive unsupervised learning problems.
- Understanding space/sample complexity trade-offs when processing stochastic data.
- Developing fine-grained approximation algorithms relevant to core data science tasks.
- Using coding theory to enable communication-efficient distributed machine learning.
- Designing variational inference methods with statistical guarantees given limited resources.
- Developing a principled approach to exploiting trade-offs between bias, model complexity, and computational budget.

#### Publications

- Algebraic and Analytic Approaches for Parameter Learning in Mixture Models.

ALT 2020 (with A. Krishnamurthy, A. Mazumdar, A. McGregor, S. Pal) - Data-driven Uncertainty Quantification in Systematically Coarse-grained Models.

Submitted. (T. Jin, A. Chazirakis, E. Kalligiannaki, V. Harmandaris and M. A. Katsoulakis) - Distributional Robustness and Uncertainty Quantification for Rare Events

ArXiv 2019 (J. Birrell, P. Dupuis, M. Katsoulakis, L. Rey-Bellet, J. Wang) - Vertex Ordering Problems in Directed Graph Streams.

SODA 2020 (A. Chakrabarti, P. Ghosh, A. McGregor, S. Vorotnikova) - Sample Complexity of Learning Mixtures of Sparse Linear Regressions

NeurIPS 2020 (A. Krishnamurthy, A. Mazumdar, A. McGregor, S. Pal)