Saurav Muralidharan

Senior Research Scientist, NVIDIA
Email: mail [at] sauravm.com

 Twitter |  LinkedIn | CV

I am a scientist at NVIDIA Research, working in the Deep Learning Efficiency Research (DLER) team. My work focuses on improving the runtime performance and efficiency of deep neural networks, especially large language models (LLMs), using techniques like model compression (sparsity, low-rank factorization, distillation, etc.) and neural architecture search (NAS).

Prior to joining NVIDIA, I completed my Ph.D. in Computer Science from the University of Utah under the guidance of Prof. Mary Hall. While at Utah, I worked on machine learning-based techniques to improve the performance, portability, and energy efficiency of GPU programs.

Recent Publications

[  Google Scholar |  DBLP ]
HighLight: Efficient and Flexible DNN Acceleration with Hierarchical Structured Sparsity
Y. N. Wu, P. Tsai, S. Muralidharan, A. Parashar, V. Sze, J. Emer
arXiv 2305.12718 (2023).  [ pdf]
Uniform Sparsity in Deep Neural Networks
S. Muralidharan
Sixth Conference on Machine Learning and Systems (MLSys 2023).  [ pdf]
Efficient Sparsely Activated Transformers
S. Latifi, S. Muralidharan, M. Garland
arXiv 2208.14580 (2022).  [ pdf]
Going Beyond Classification Accuracy Metrics in Model Compression
V. Joseph, S. A. Siddiqui, A. Bhaskara, G. Gopalakrishnan, S. Muralidharan, M. Garland, S. Ahmed, A. Dengel
arXiv 2012.01604 (2021).  [ pdf]
A Programmable Approach to Neural Network Compression
V. Joseph, G. Gopalakrishnan, S. Muralidharan, M. Garland, A. Garg
IEEE Micro Special Issue on Machine Learning for Systems, 2020.
[ code |  pdf (arXiv) |  talk]

Open-Source Software

[ GitHub Profile ]
Condensa
github.com/NVLabs/condensa

Condensa is a framework for programmable model compression in Python. It comes with a set of built-in compression operators which may be used to compose complex compression schemes targeting specific combinations of DNN architecture, hardware platform, and optimization objective. To recover any accuracy lost during compression, Condensa uses a constrained optimization formulation of model compression and employs an Augmented Lagrangian-based algorithm as the optimizer.

TensorLy-Torch
tensorly.org/torch

Tensor methods generalize matrix algebraic operations to higher-orders, and can help deep neural networks better preserve and leverage local structure. TensorLy-Torch is a PyTorch library that builds on top of TensorLy and provides out-of-the-box tensor layers. It comes with all batteries included and tries to make it as easy as possible to use tensor methods within your deep networks.

Nitro Autotuning Framework
nitro-tuner.github.io

Nitro is a programmer-directed code variant tuning framework, jointly developed by the University of Utah and NVIDIA Research. It utilizes machine learning-based classification to automatically find the best implementation (variant) of a computation for a given input. Nitro provides C++ and Python interfaces for programmers to specify variants, input dataset features, and constraints.

Professional Service