Research Overview
Many areas of science, engineering, and industry are already being revolutionized by the adoption of tools and techniques from data science. However, a rigorous analysis of existing approaches together with the development of new ideas is necessary to a) ensure the optimal use of available computational and statistical resources and b) develop a principled and systematic approach to the relevant problems rather than relying on a collection of ad hoc solutions. In particular, there are many interrelated questions that arise in a typical data science project.- First is the acquisition of relevant data: Can data be collected interactively and might this reduce the costs of data acquisition? Is the data noisy and how might this impact the results?
- Second is the processing of data: If the data cannot fit in the memory of a single machine, how can we minimize the communication costs within a cluster of machines? When are approximate answers sufficient and how does the required accuracy trade off with the computational resources available?
- Third is the prediction value of the available data: Can the uncertainty of the final results be quantified? How can the modeling assumptions used by our algorithms be efficiently evaluated?
- Understanding the trade-off between rounds of interactive data acquisition and statistical and computational efficiency.
- Minimizing query complexity in interactive unsupervised learning problems.
- Understanding space/sample complexity trade-offs when processing stochastic data.
- Developing fine-grained approximation algorithms relevant to core data science tasks.
- Using coding theory to enable communication-efficient distributed machine learning.
- Designing variational inference methods with statistical guarantees given limited resources.
- Developing a principled approach to exploiting trade-offs between bias, model complexity, and computational budget.
Publications
- Sample Complexity of Probability Divergences under Group Symmetry
ICML 2023 (Z. Chen, M. A. Katsoulakis, L. Rey-Bellet, W. Zhu) - Identification of significant gene expression changes in multiple perturbation experiments using knockoffs
Briefings in Bioinformatics 2023 (T. Zhao, G. Zhu, H. V. Dubey, P. Flaherty) - Function-space regularized Rényi divergences
ICLR 2023 (J. Birrell, P. Dupuis, M. A. Katsoulakis, Y. Pantazis, L. Rey-Bellet) - Model Uncertainty and Correctability for Directed Graphical Models
SIAM/ASA Journal on Uncertainty Quantification 10 (4),1461-1512, (2022) (P. Birmpa, J. Feng, M. A. Katsoulakis, L. Rey-Bellet) - Structure-preserving GANs
Proceedings of the 39th International Conference on Machine Learning, PMLR 162:1982-2020, (2022). (J. Birrell, M. A. Katsoulakis, L. Rey-Bellet, W. Zhu) - Cumulant GAN
IEEE Transactions on Neural Networks and Learning Systems, (2022), doi: 10.1109/TNNLS.2022.3161127 (Y. Pantazis, D. Paul, M. Fasoulakis, Y. Stylianou, M. A. Katsoulakis) - Optimizing variational representations of divergences and accelerating their statistical estimation
IEEE Transactions on Information Theory, (2022), doi: 10.1109/TIT.2022.3160659 (J. Birrell, M. A. Katsoulakis, Y. Pantazis) - Uncertainty quantification for Markov Random Fields
SIAM/ASA J. Uncertainty Quantification, 9(4), 1457–1498, (2021). https://doi.org/10.1137/20M1374614 (P. Birmpa, M. A. Katsoulakis) - (f , Γ)-Divergences: Interpolating between f-Divergences and Integral Probability Metrics
Journal of Machine Learning Research, 23 (39), 1-70, (2022) (J. Birrell, P. Dupuis, M. A. Katsoulakis, Y. Pantazis, L. Rey-Bellet) - Uncertainty Quantification and Error Propagation in the Enthalpy and Entropy of Surface Reactions Arising from a Single DFT Functional
J. Phys. Chem, (Gerhard Wittreich, Geun Ho Gu, Daniel Robinson, Markos Katsoulakis, Markos, Dionisios Vlachos) - Probabilistic Group Testing with a Linear Number of Tests
IEEE International Symposium on Information Theory (ISIT), 2021 (Larkin Flodin, Arya Mazumdar) - Variational Representations and Neural Network Estimation of Renyi Divergences
SIAM Journal on Mathematics of Data Science (J. Birrell, P. Dupuis, M. A. Katsoulakis, L. Rey-Bellet and J. Wang) - Mutual Information for Explainable Deep Learning of Multiscale Systems
Journal of Computational Physics, Volume 444, 2021, 110551, .(Søren Taverniers, Eric J. Hall, Markos A. Katsoulakis, Daniel M. Tartakovsky) - Trace Reconstruction: Generalized and Parameterized
IEEE Transactions on Information Theory (A. Krishnamurthy, A. Mazumdar, A. McGregor, S. Pal) - PredictRoute: A Network Path Prediction Toolkit.
Sigmetrics 2021 (R. Singh, D. Tench, A. McGregor, P. Gill) - Correlation Clustering in Data Streams
Algorithmica (K. Ahn, G. Cormode, S. Guha, A. McGregor A. Wirth) - Exact and Approximate Hierarchical Clustering Using A*
UAI 2021 (Craig S. Greenberg, Sebastian Macaluso, Nicholas Monath, Avinava Dubey, Patrick Flaherty, Manzil Zaheer, Amr Ahmed, Kyle Cranmer and Andrew McCallum) - Doubly Non-Central Beta Matrix Factorization for DNA Methylation Data
UAI 2021 (Aaron Schein, Anjali Nagulpally, Hanna Wallach, and Patrick Flaherty) - GINNs: Graph-Informed Neural Networks for Multiscale Physics
Journal of Computational Physics, 2021 (E. J. Hall, S. Taverniers, M. A. Katsoulakis and D. Tartakovsky) - Efficient and Effective ER with Progressive Blocking
VLDB Journal, 2021. (S. Galhotra, D. Firmani, B. Saha, D. Srivastava) - Cluster Trellis: Data Structures & Algorithms for Exact Inference in Hierarchical Clustering
AISTATS 2021 (C. Greenberg, S. Macaluso, N. Monath, J. Lee, P. Flaherty, K. Cranmer, A. McGregor, A. McCallum) - Intervention Efficient Algorithms for Approximate Learning of Causal Graphs
ALT 2021 (R. Addanki, A. McGregor, C. Musco) - Diverse Data Selection under Fairness Constraints
ICDT 2021 (Z. Moumoulidou, A. McGregor, A. Meliou) - Maximum Coverage in the Data Stream Model: Parameterized and Generalized
ICDT 2021 (A. McGregor, D. Tench, H. Vu) - Semisupervised Clustering by Queries and Locally Encodable Source Coding
IEEE Transactions on Information Theory (TIT), vol. 67, no. 2, 2021. (A. Mazumdar, S. Pal) - vqSGD: Vector Quantized Stochastic Gradient Descent
AISTATS 2021. (V. Gandikota, D. Kane, R. Maity, A. Mazumdar) - Recovery of Sparse Linear Classifiers from Mixture of Responses
NeurIPS 2020. (V. Gandikota, A. Mazumdar, S. Pal) - A Workload-Adaptive Mechanism for Linear Queries Under Local Differential Privacy
VLDB 2020. (R. McKenna, R. Maity, A. Mazumdar, G. Miklau) - Recovery of Sparse Signals from a Mixture of Linear Samples
ICML 2020. (A. Mazumdar, S. Pal) - Explainable and trustworthy artificial intelligence for correctable modeling in chemical sciences
Science Advances (J. Feng, J. L. Lansford, M. A. Katsoulakis, M. G. Vlachos) - Sublinear-Time Algorithms for Computing & Embedding Gap Edit Distance
FOCS 2020 (T. Kociumaka, B. Saha) - High Dimensional Discrete Integration over the Hypergrid
UAI 2020 (R. Maity, A. Mazumdar, S. Pal) - Efficient Intervention Design for Causal Discovery with Latents
ICML 2020 (with R. Addanki, S. Kasiviswanathan, C. Musco) - Quantification of Model Uncertainty on Path-Space via Goal-Oriented Relative Entropy
ESAIM: M2AN, 55 1 (2021) 131-169 (Jeremiah Birrell, Markos A. Katsoulakis and Luc Rey-Bellet) - Does Preprocessing help in Fast Sequence Comparisons?
STOC 2020 (E. Goldenberg, A. Rubinstein, B. Saha) - Reliable Distributed Clustering with Redundant Data Assignment
ISIT 2020 (V. Gandikota, A. Mazumdar, A. S. Rawat) - Triangle and Four Cycle Counting in the Data Stream Model
PODS 2020 (A. McGregor, S. Vorotnikova) - Algebraic and Analytic Approaches for Parameter Learning in Mixture Models
ALT 2020 (with A. Krishnamurthy, A. Mazumdar, A. McGregor, S. Pal) - Data-driven Uncertainty Quantification in Systematically Coarse-grained Models
Soft Materials, 18:2-3, 348-368. (T. Jin, A. Chazirakis, E. Kalligiannaki, V. Harmandaris and M. A. Katsoulakis) - Vertex Ordering Problems in Directed Graph Streams.
SODA 2020 (A. Chakrabarti, P. Ghosh, A. McGregor, S. Vorotnikova) - Sample Complexity of Learning Mixtures of Sparse Linear Regressions
NeurIPS 2019 (A. Krishnamurthy, A. Mazumdar, A. McGregor, S. Pal)
Preprints
- Optimizing variational representations of divergences and accelerating their statistical estimation
ArXiv 2020, Under revision at IEEE Trans Info Theory (J. Birrell, M. A. Katsoulakis, Y. Pantazis) - Cumulant GAN
ArXiv 2020, Under revision at IEEE Trans NN and Learning Syst. (Y. Pantazis, D. Paul, M. Fassoulakis, Y. Stylianou, M. A. Katsoulakis) - MAP Clustering under the Gaussian Mixture Model via Mixed Integer Nonlinear Optimization
ArXiv 2020 (P. Flaherty, P. Wiratchotisatian, J. Lee, Z. Tang, A. Trapp) - Distributionally Robust Variance Minimization: Tight Variance Bounds over f-Divergence Neighborhoods
ArXiv 2020 (J. Birrell) - Distributional Robustness and Uncertainty Quantification for Rare Events
ArXiv 2019 (J. Birrell, P. Dupuis, M. Katsoulakis, L. Rey-Bellet, J. Wang) - Model Uncertainty and Correctability for Directed Graphical Models
(Panagiota Birmpa, Jinchao Feng, Markos Katsoulakis, Luc Rey-Bellet)