Publications

  1. WASL: Harmonizing Uncoordinated Adaptive Modules in Multi-Tenant Cloud Systems
    Ahsan Pervaiz, Anwesha Das, Vedant Kodagi, Muhammad Husni Santriaji, Henry Hoffmann
    ACM/SPEC ICPE 2026 [PDF] [Slides] [Artifact]
  2. Anomaly Localization for Performance Instabilities at Complex Accelerator Facilities
    Under Submission [Preprint]
  3. Flexible Windowing for Correlation-Aware Ranking in Anomalous Environments
    Anwesha Das, Henry Hoffmann, Alex Aiken
    IEEE ICDM 2025 [PDF] [Slides]
  4. Prolego: Time-Series Analysis for Predicting Failures in Complex Systems
    Anwesha Das, Alex Aiken
    IEEE ACSOS 2023 [PDF] [Slides] [Code]
  5. Performance Variability and Causality in Complex Systems
    Anwesha Das, Daniel Ratner, Alex Aiken
    IEEE ACSOS 2022 [PDF] [Talk] [Poster]
  6. Proactive Resilience via Log Mining for Production Systems
    Under Submission [Preprint]
  7. Systemic Assessment of Node Failures in HPC Production Platforms
    Anwesha Das, Frank Mueller, Barry Rountree
    IEEE IPDPS 2021 [PDF] [Slides] [Talk] (requires subscription) [Code] [Data]
  8. Aarohi: Making Real-time Node Failure Prediction Feasible
    Anwesha Das, Frank Mueller, Barry Rountree
    IEEE IPDPS 2020 [PDF] [Slides] [Code]
  9. Desh: Deep Learning for System Health Prediction of Lead Times to Failure in HPC
    Anwesha Das, Frank Mueller, Charles Siegel, Abhinav Vishnu
    ACM HPDC 2018 [PDF] [Poster]
  10. Doomsday: Predicting Which Node Will Fail When on Supercomputers
    Anwesha Das, Frank Mueller, Paul Hargrove, Eric Roman, Scott Baden
    ACM/IEEE SC 2018 (Best Student Paper Finalist) [PDF] [Data] [Slides] [HPCWire Coverage]
  11. KeyValueServe: Design and Performance Analysis of a Multi-Tenant Data Grid as a Cloud Service
    Anwesha Das, Arun Iyengar, Frank Mueller
    Concurrency and Computation: Practice and Experience, June 2018 [PDF] [Link]
  12. Performance Analysis of a Multi-Tenant In-memory Data Grid
    Anwesha Das, Frank Mueller, Xiaohui Gu, Arun Iyengar
    IEEE Cloud 2016 [PDF]
  13. Dynamic Resource Management using Virtual Machine Migrations
    Mayank Mishra, Anwesha Das, Purushottam Kulkarni, Anirudha Sahoo
    IEEE Communications Magazine, June 2012 [PDF]

Peer-Reviewed Short Papers and Posters

  1. Anomaly Detection in Accelerator Facilities Using Machine Learning
    Anwesha Das, Daniel Ratner, Michael Borland, Louis Emery, Xiaobiao Huang, Hairong Shang, Guobao Shen, Reid Smith, Guimei Wang
    International Particle Accelerator Conference, IPAC'21 [PDF] [Poster]
  2. Holistic Root Cause Analysis of Node Failures in Production HPC
    Anwesha Das, Frank Mueller
    ACM SRC SC'18 [PDF] [Poster]
  3. Aarohi: Efficient Online Failure Prediction
    Anwesha Das, Frank Mueller
    ACM SRC ASPLOS'18 (Semi-Finalist, amongst Top-5) [PDF] [Poster]
  4. Desh: Deep Learning for HPC System Health Resilience
    Anwesha Das, Abhinav Vishnu, Charles Siegel, Frank Mueller
    ACM/IEEE SC'17 [PDF] [Poster]
  5. Pin-Pointing Node Failures in HPC Systems
    Anwesha Das, Frank Mueller, Paul Hargrove, Eric Roman
    ACM/IEEE SC'16 [PDF] [Poster]

Theses