Divyanshu Saxena | publications

2025

HotOS ’25 Workshop

How I learned to stop worrying and love learned OS policies

Divyanshu Saxena, Jiayi Chen, Sujay Yadalam, Yeonju Ro, Rohit Dwivedula, Eric Campbell, Aditya Akella, Christopher Rossbach, and Michael Swift

Abs PDF URL Slides

While machine learning has been adopted across various fields, its ability to outperform traditional heuristics in operating systems is often met with justified skepticism. Concerns about unsafe decisions, opaque debugging processes, and the challenges of integrating ML into the kernel—given its stringent latency constraints and inherent complexity—make practitioners understandably cautious. This paper introduces "Guardrails for the OS", a framework that allows kernel developers to declaratively specify system-level properties and define corrective actions to address property violations. The framework facilitates the compilation of these guardrails into monitors capable of running within the kernel. In this work, we establish the foundation for Guardrails, detailing its core abstractions, examining the problem space, and exploring potential solutions.
ICLR ’25 Conference

CONGO: Compressive Online Gradient Optimization

Jeremy Carleton, Prathik Vijaykumar, Divyanshu Saxena, Dheeraj Narasimha, Srinivas Shakkottai, and Aditya Akella

Abs PDF URL

We address the challenge of zeroth-order online convex optimization where the objective function’s gradient exhibits sparsity, indicating that only a small number of dimensions possess non-zero gradients. Our aim is to leverage this sparsity to obtain useful estimates of the objective function’s gradient even when the only information available is a limited number of function samples. Our motivation stems from the optimization of large-scale queueing networks that process time-sensitive jobs. Here, a job must be processed by potentially many queues in sequence to produce an output, and the service time at any queue is a function of the resources allocated to that queue. Since resources are costly, the end-to-end latency for jobs must be balanced with the overall cost of the resources used. While the number of queues is substantial, the latency function primarily reacts to resource changes in only a few, rendering the gradient sparse. We tackle this problem by introducing the Compressive Online Gradient Optimization framework which allows compressive sensing methods previously applied to stochastic optimization to achieve regret bounds with an optimal dependence on the time horizon without the full problem dimension appearing in the bound. For specific algorithms, we reduce the samples required per gradient estimate to scale with the gradient’s sparsity factor rather than its full dimensionality. Numerical simulations and real-world microservices benchmarks demonstrate CONGO’s superiority over gradient descent approaches that do not account for sparsity.
ASPLOS ’25 Conference

Copper and Wire: Bridging Expressiveness and Performance for Service Mesh Policies

Divyanshu Saxena, William Zhang, Shankara Pailoor, Isil Dillig, and Aditya Akella

Abs PDF Code URL Slides

Distributed microservice applications require a convenient means of controlling L7 communication between services. Service meshes have emerged as a popular approach to achieving this. However, current service mesh frameworks are difficult to use – they burden developers in realizing even simple communication policies, lack compatibility with diverse dataplanes, and introduce performance and resource overheads. We identify the root causes of these drawbacks and propose a ground-up new mesh architecture that overcomes them. We develop novel abstractions for mesh communication, a new mesh policy language centered on these abstractions to enable expressive policies, and a novel control plane that enables using minimal dataplane resources for policy enforcement. We develop the precise semantics of our language abstractions and demonstrate how our control plane can use them to execute policies correctly and optimally. We build and evaluate a prototype on realistic workloads and policies and open-source production traces. Our results show that complex policies can be specified in up to 6.75X fewer lines, enforced with up to 2.6X smaller tail latencies and up to 39% fewer CPU resources than today.

2024

Arxiv Preprint

C3: Learning Congestion Controllers with Formal Certificates

Chenxi Yang, Divyanshu Saxena, Rohit Dwivedula, Kshiteej Mahajan, Swarat Chaudhuri, and Aditya Akella

Abs PDF URL

Learning-based congestion controllers offer better adaptability compared to traditional heuristic algorithms. However, the inherent unreliability of learning techniques can cause learning-based controllers to behave poorly, creating a need for formal guarantees. While methods for formally verifying learned congestion controllers exist, these methods offer binary feedback that cannot optimize the controller toward better behavior. We improve this state-of-the-art via C3, a new learning framework for congestion control that integrates the concept of formal certification in the learning loop. C3 uses an abstract interpreter that can produce robustness and performance certificates to guide the training process, rewarding models that are robust and performant even on worst-case inputs. Our evaluation demonstrates that unlike state-of-the-art learned controllers, C3-trained controllers provide both adaptability and worst-case reliability across a range of network conditions.

2023

MLSys at
NeurIPS ’23 Workshop

On a Foundation Model for Operating Systems

Divyanshu Saxena, Nihal Sharma, Donghyun Kim, Rohit Dwivedula, Jiayi Chen, Chenxi Yang, Sriram Ravula, Zichao Hu, Aditya Akella, Sebastian Angel, Joydeep Biswas, Swarat Chaudhuri, Isil Dillig, Alex Dimakis, Brighten P Godfrey, Daehyeok Kim, Christopher Rossbach, and Gang Wang

Abs PDF URL

This paper lays down the research agenda for a domain-specific foundation model for operating systems (OSes). Our case for a foundation model revolves around the observations that several OS components such as CPU, memory, and network subsystems are interrelated and that OS traces offer the ideal dataset for a foundation model to grasp the intricacies of diverse OS components and their behavior in varying environments and workloads. We discuss a wide range of possibilities that then arise, from employing foundation models as policy agents to utilizing them as generators and predictors to assist traditional OS control algorithms. Our hope is that this paper spurs further research into OS foundation models and creating the next generation of operating systems for the evolving computing landscape.
SoCC ’23 Conference

Yama: Providing Performance Isolation for Black-Box Offloads

Tao Ji, Divyanshu Saxena, Brent E. Stephens, and Aditya Akella

Abs PDF URL

The sharing of clusters with various on-NIC offloads by high-level entities (users, containers, etc.) has become increasingly common. Performance isolation across these entities is desired because the offloads can become bottlenecks due to the limited capacity of hardware. However, the existing works that provide scheduling and resource management to NIC offloads all require customization of the NIC or offloads, while commodity off-the-shelf NICs and offloads with proprietary implementation have been widely deployed in datacenters. This paper presents Yama, the first solution to enable per-entity isolation in the sharing of such black-box NIC offloads. Yama provides a generic framework that captures a common abstraction to the operation of most offloads, which allows operators to incorporate existing offloads. The framework proactively probes for the performance of the offloads with auxiliary workload and enforces isolation at the initiator side. Yama also accommodates chained offloads. Our evaluation shows that 1) Yama achieves per-entity max-min fairness for various types of offloads and in complicated offload chaining scenarios; 2) Yama quickly converges to changes in equilibrium and 3) Yama adds negligible overhead to application workload.
Arxiv Preprint

Dirigo: Self-scaling Stateful Actors For Serverless Real-time Data Processing

Le Xu, Divyanshu Saxena, Neeraja J. Yadwadkar, Aditya Akella, and Indranil Gupta

Abs PDF URL

We propose Dirigo, a distributed stream processing service built atop virtual actors. Dirigo achieves both a high level of resource efficiency and performance isolation driven by user intent (SLO). To improve resource efficiency, Dirigo adopts a serverless architecture that enables time-sharing of compute resources among streaming operators, both within and across applications. Meanwhile, Dirigo improves performance isolation by inheriting the property of function autoscaling from serverless architecture. Specifically, Dirigo proposes (i) dual-mode actor, an actor abstraction that dynamically provides orderliness guarantee for streaming operator during autoscaling and (ii) a data plane scheduling mechanism, along with its API, that allows scheduling and scaling at the message-level granularity.
ApPLIED at
PODC ’23 Workshop

Towards Efficient Microservice Communication.

Divyanshu Saxena, William Zhang, Madhav Tummala, Saksham Goel, and Aditya Akella

Abs PDF URL

Distributed applications on the cloud are being developed and deployed as microservices as opposed to the monolithic architecture. Service Meshes have emerged as a way of specifying communication policies between microservices. Service Meshes have the potential to abstract the networking requirements of distributed applications from the application logic. However, current service mesh frameworks introduce significant performance and resource overheads. We study the overheads of service meshes and make a case for redesigning both the control plane and data plane for service meshes. First, we propose the notion of Application Defined Middleboxes, which makes it possible for the mesh control planes to reduce the overheads by optimizing where to implement application policies. Second, we demonstrate preliminary ideas on accelerating the data plane to further reduce the overheads.
ACM OSR ’23 Short Paper

Navigating Performance-Efficiency Tradeoffs in Serverless Computing: Deduplication to the Rescue!

Divyanshu Saxena, Tao Ji, Arjun Singhvi, Junaid Khalid, and Aditya Akella

Abs PDF URL

Navigating the performance and efficiency trade-offs is critical for serverless platforms, where the providers ideally want to give the illusion of warm function startups while maintaining low resource costs. Limited controls, provided via toggling sandboxes between warm and cold states and keepalives, force operators to sacrifice significant resources to achieve good performance. We present Medes, a serverless framework, that allows operators to navigate the trade-off space smoothly. Our approach takes advantage of the high duplication in warm sandboxes on serverless platforms to develop a new sandbox state, called a ‘dedup state’, that is more memory-efficient than the warm state and faster to restore from than the cold state. We use innovative techniques to identify redundancy with minimal overhead, and provide a simple management policy to balance performance and memory. Our evaluation demonstrates that Medes can provide up to 3.8X better end-to-end latencies and reduce the number of cold starts by 10-50% against the state-of-the-art baselines.
NSDI ’23 Poster

Application-tailored Communication with xMesh.

Divyanshu Saxena, Saksham Goel, William Zhang, Madhav Tummala, and Aditya Akella

2022

EuroSys ’22 Conference

Memory Deduplication for Serverless Computing with Medes.

Divyanshu Saxena, Tao Ji, Arjun Singhvi, Junaid Khalid, and Aditya Akella

Abs PDF Code URL Slides

Serverless platforms today impose rigid trade-offs between resource use and user-perceived performance. Limited controls, provided via toggling sandboxes between warm and cold states and keep-alives, force operators to sacrifice significant resources to achieve good performance. We present a serverless framework, Medes, that breaks the rigid trade-off and allows operators to navigate the trade-off space smoothly. Medes leverages the fact that the warm sandboxes running on serverless platforms have a high fraction of duplication in their memory footprints. We exploit these redundant chunks to develop a new sandbox state, called a dedup state, that is more memory-efficient than the warm state and faster to restore from than the cold state. We develop novel mechanisms to identify memory redundancy at minimal overhead while ensuring that the dedup containers’ memory footprint is small. Finally, we develop a simple sandbox management policy that exposes a narrow, intuitive interface for operators to trade-off performance for memory by jointly controlling warm and dedup sandboxes. Detailed experiments with a protoformat using real-world serverless workloads demonstrate that Medes can provide up to 1X-2.75X improvements in the end-to-end latencies. The benefits of Medes are enhanced in memory pressure situations, where Medes can provide up to 3.8X improvements in end-to-end latencies. Medes achieves this by reducing the number of cold starts incurred by 10-50% against the state-of-the-art baselines.