Publications
- WASL: Harmonizing Uncoordinated Adaptive Modules in Multi-Tenant Cloud Systems
Ahsan Pervaiz, Anwesha Das, Vedant Kodagi, Muhammad Husni Santriaji, Henry Hoffmann
ACM/SPEC ICPE 2026 [PDF] [Slides] [Artifact] - Anomaly Localization for Performance Instabilities at Complex Accelerator Facilities
Under Submission [Preprint] - Flexible Windowing for Correlation-Aware Ranking in Anomalous Environments
Anwesha Das, Henry Hoffmann, Alex Aiken
IEEE ICDM 2025 [PDF] [Slides] - Prolego: Time-Series Analysis for Predicting Failures in Complex Systems
Anwesha Das, Alex Aiken
IEEE ACSOS 2023 [PDF] [Slides] [Code] - Performance Variability and Causality in Complex Systems
Anwesha Das, Daniel Ratner, Alex Aiken
IEEE ACSOS 2022 [PDF] [Talk] [Poster] - Proactive Resilience via Log Mining for Production Systems
Under Submission [Preprint] - Systemic Assessment of Node Failures in HPC Production Platforms
Anwesha Das, Frank Mueller, Barry Rountree
IEEE IPDPS 2021 [PDF] [Slides] [Talk] (requires subscription) [Code] [Data] - Aarohi: Making Real-time Node Failure Prediction Feasible
Anwesha Das, Frank Mueller, Barry Rountree
IEEE IPDPS 2020 [PDF] [Slides] [Code] - Desh: Deep Learning for System Health Prediction of Lead Times to Failure in HPC
Anwesha Das, Frank Mueller, Charles Siegel, Abhinav Vishnu
ACM HPDC 2018 [PDF] [Poster] - Doomsday: Predicting Which Node Will Fail When on Supercomputers
Anwesha Das, Frank Mueller, Paul Hargrove, Eric Roman, Scott Baden
ACM/IEEE SC 2018 (Best Student Paper Finalist) [PDF] [Data] [Slides] [HPCWire Coverage] - KeyValueServe: Design and Performance Analysis of a Multi-Tenant Data Grid as a Cloud Service
Anwesha Das, Arun Iyengar, Frank Mueller
Concurrency and Computation: Practice and Experience, June 2018 [PDF] [Link] - Performance Analysis of a Multi-Tenant In-memory Data Grid
Anwesha Das, Frank Mueller, Xiaohui Gu, Arun Iyengar
IEEE Cloud 2016 [PDF] - Dynamic Resource Management using Virtual Machine Migrations
Mayank Mishra, Anwesha Das, Purushottam Kulkarni, Anirudha Sahoo
IEEE Communications Magazine, June 2012 [PDF]
Peer-Reviewed Short Papers and Posters
- Anomaly Detection in Accelerator Facilities Using Machine Learning
Anwesha Das, Daniel Ratner, Michael Borland, Louis Emery, Xiaobiao Huang, Hairong Shang, Guobao Shen, Reid Smith, Guimei Wang
International Particle Accelerator Conference, IPAC'21 [PDF] [Poster] - Holistic Root Cause Analysis of Node Failures in Production HPC
Anwesha Das, Frank Mueller
ACM SRC SC'18 [PDF] [Poster] - Aarohi: Efficient Online Failure Prediction
Anwesha Das, Frank Mueller
ACM SRC ASPLOS'18 (Semi-Finalist, amongst Top-5) [PDF] [Poster] - Desh: Deep Learning for HPC System Health Resilience
Anwesha Das, Abhinav Vishnu, Charles Siegel, Frank Mueller
ACM/IEEE SC'17 [PDF] [Poster] - Pin-Pointing Node Failures in HPC Systems
Anwesha Das, Frank Mueller, Paul Hargrove, Eric Roman
ACM/IEEE SC'16 [PDF] [Poster]
Theses
- PhD (2019): Predicting Location and Time of Anomalies in Large-Scale Computing Systems via Log Mining [Link] [PDF]
- Master's (2012): A Comparative Analysis of Server Consolidation Algorithms on a Novel Software Framework [Link] [PDF]