§ 02 research

Themes

Six threads of work across hardware, software, and the systems that connect them — with a strong bias toward making ML cheaper and more durable on power-constrained substrates.

§ 01 theme

Energy-Harvesting & Intermittent Computing

Devices powered by ambient energy — solar, RF, vibration — operate intermittently. Conventional ML pipelines assume reliable power; the real world doesn't cooperate.

My work re-architects the entire stack: training procedures that bake intermittency into the optimizer, schedulers that handle partial computations, accelerators that adapt their shape to available power, and communication co-design that minimizes what gets sent off the node at all.

representative papers

Revisiting DNN Training for Intermittently-Powered EH Micro-Computers ICLR 2025
Synergistic and Efficient Edge-Host Communication for EH-WSNs arXiv 2024
Seeker: Synergizing Mobile and EH Wearable Sensors for HAR arXiv 2022
Origin: Enabling On-Device Intelligence for HAR using EH-WSNs DATE 2021
ResiRCA: Resilient Energy Harvesting ReRAM-based Accelerator HPCA 2020

projects

NExUME → Seeker → Origin →

§ 02 theme

Computational Storage for ML

When the model is small relative to the dataset, the bottleneck isn't compute — it's the I/O. Computational storage moves work into the storage device itself, so encrypted, compressed, redundant data never has to cross the host bus before it's useful.

I build FPGA-accelerated CSDs that handle near-data compression, encryption, and redundancy for continuous-learning edge servers and parallel query pipelines.

representative papers

Salient Store: Smart Storage for Continuous-Learning Edge Servers PACT 2025
CORD: Parallelizing Query Processing across Multiple Computational Storage Devices IPDPS 2025

projects

Salient Store →

§ 03 theme

Sustainable Edge Servers

A solar-powered edge server that doesn't need a battery is an architecture problem first and a power problem second. The accelerator has to morph as power fluctuates; the training procedure has to keep something useful happening even when the panel goes quiet.

Closely related work on retrieval-augmented generation pipelines for edge devices explores the same question for inference.

representative papers

Usás: A Sustainable Continuous-Learning Framework for Edge Servers HPCA 2024
MaestroRAG: Orchestrated Pipeline Architecture for Efficient RAG on Edge Devices Under Review 2025

projects

Usás →

§ 04 theme

Efficient ML Inference & Serving

Whether the constraint is a public-cloud SLO or a router's power envelope, inference serving is a multi-dimensional optimization problem.

My contributions span ensemble-based serving in the cloud (Cocktail), serverless container provisioning for dynamic DAGs (Kraken), transfer-learning-based perf prediction (MLPP), and expert prediction for sparse mixture-of-experts inference (Prophet).

representative papers

Prophet: Neural Expert Prediction for Efficient MoE Inference Under Review 2025
Cocktail: Multidimensional Optimization for Model Serving in Cloud NSDI 2022
Kraken: Adaptive Container Provisioning for Dynamic DAGs in Serverless SoCC 2021
MLPP: Transfer Learning and Model Distillation for Predicting App Performance NAS 2021
Implications of Public Cloud Resource Heterogeneity for Inference Serving WoSC 2020

projects

Prophet →

§ 05 theme

Hardware Co-design for Specialized Compute

Domain-specific accelerators for AR/VR, point clouds, and ReRAM crossbars are where architecture meets perception. Spatio-temporal compute reuse for 360° VR streaming, octree-based point cloud compression at the edge, and analog-activation ReRAM neural network co-design are the venues where I've made the case that co-design beats general-purpose silicon for these workloads.

representative papers

Hardware-Aware NN Co-Design with Analog Activation for ReRAM Crossbars Under Review 2025
Pushing Point Cloud Compression to Edge MICRO 2022
Exploiting Frame Similarity for Efficient Inference on Edge Devices ICDCS 2022
HoloAR: On-the-fly Optimization of 3D Holographic Processing for AR MICRO 2021
Déjà view: Spatio-Temporal Compute Reuse for 360° VR Video Streaming ISCA 2020

§ 06 theme

SoC Performance & Power Modeling

My current work at Arm is on pre-silicon performance and power modeling for next-generation heterogeneous SoCs targeting Edge AI, IoT, and automotive workloads. The job is informing architectural trade-off decisions long before silicon exists — building analytical and empirical models, characterizing workloads, and integrating performance data into CI for regression detection.