We address the challenge of zeroth-order online convex optimization where the objective function’s gradient exhibits sparsity, indicating that only a small number of dimensions possess non-zero gradients. Our aim is to leverage this sparsity to obtain useful estimates of the objective function’s gradient even when the only information available is a limited number of function samples. Our motivation stems from the optimization of large-scale queueing networks that process time-sensitive jobs. Here, a job must be processed by potentially many queues in sequence to produce an output, and the service time at any queue is a function of the resources allocated to that queue. Since resources are costly, the end-to-end latency for jobs must be balanced with the overall cost of the resources used. While the number of queues is substantial, the latency function primarily reacts to resource changes in only a few, rendering the gradient sparse. We tackle this problem by introducing the Compressive Online Gradient Optimization framework which allows compressive sensing methods previously applied to stochastic optimization to achieve regret bounds with an optimal dependence on the time horizon without the full problem dimension appearing in the bound. For specific algorithms, we reduce the samples required per gradient estimate to scale with the gradient’s sparsity factor rather than its full dimensionality. Numerical simulations and real-world microservices benchmarks demonstrate CONGO’s superiority over gradient descent approaches that do not account for sparsity.
ASPLOS ’25
Conference
Copper and Wire: Bridging Expressiveness and Performance for Service Mesh Policies
Divyanshu Saxena,
William Zhang,
Shankara Pailoor,
Isil Dillig,
and Aditya Akella
In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 1
Distributed microservice applications require a convenient means of controlling L7 communication between services. Service meshes have emerged as a popular approach to achieving this. However, current service mesh frameworks are difficult to use – they burden developers in realizing even simple communication policies, lack compatibility with diverse dataplanes, and introduce performance and resource overheads. We identify the root causes of these drawbacks and propose a ground-up new mesh architecture that overcomes them. We develop novel abstractions for mesh communication, a new mesh policy language centered on these abstractions to enable expressive policies, and a novel control plane that enables using minimal dataplane resources for policy enforcement. We develop the precise semantics of our language abstractions and demonstrate how our control plane can use them to execute policies correctly and optimally. We build and evaluate a prototype on realistic workloads and policies and open-source production traces. Our results show that complex policies can be specified in up to 6.75X fewer lines, enforced with up to 2.6X smaller tail latencies and up to 39% fewer CPU resources than today.
2024
Arxiv
Preprint
C3: Learning Congestion Controllers with Formal Certificates
Learning-based congestion controllers offer better adaptability compared to traditional heuristic algorithms. However, the inherent unreliability of learning techniques can cause learning-based controllers to behave poorly, creating a need for formal guarantees. While methods for formally verifying learned congestion controllers exist, these methods offer binary feedback that cannot optimize the controller toward better behavior. We improve this state-of-the-art via C3, a new learning framework for congestion control that integrates the concept of formal certification in the learning loop. C3 uses an abstract interpreter that can produce robustness and performance certificates to guide the training process, rewarding models that are robust and performant even on worst-case inputs. Our evaluation demonstrates that unlike state-of-the-art learned controllers, C3-trained controllers provide both adaptability and worst-case reliability across a range of network conditions.
Arxiv
Preprint
CONGO: Compressive Online Gradient Optimization with Application to Microservices Management
Jeremy Carleton,
Prathik Vijaykumar,
Divyanshu Saxena,
Dheeraj Narasimha,
Srinivas Shakkottai,
and Aditya Akella
We address the challenge of online convex optimization where the objective function’s gradient exhibits sparsity, indicating that only a small number of dimensions possess non-zero gradients. Our aim is to leverage this sparsity to obtain useful estimates of the objective function’s gradient even when the only information available is a limited number of function samples. Our motivation stems from distributed queueing systems like microservices-based applications, characterized by request-response workloads. Here, each request type proceeds through a sequence of microservices to produce a response, and the resource allocation across the collection of microservices is controlled to balance end-to-end latency with resource costs. While the number of microservices is substantial, the latency function primarily reacts to resource changes in a few, rendering the gradient sparse. Our proposed method, CONGO (Compressive Online Gradient Optimization), combines simultaneous perturbation with compressive sensing to estimate gradients. We establish analytical bounds on the requisite number of compressive sensing samples per iteration to maintain bounded bias of gradient estimates, ensuring sub-linear regret. By exploiting sparsity, we reduce the samples required per iteration to match the gradient’s sparsity, rather than the problem’s original dimensionality. Numerical experiments and real-world microservices benchmarks demonstrate CONGO’s superiority over multiple stochastic gradient descent approaches, as it quickly converges to performance comparable to policies pre-trained with workload awareness.
2023
MLSys at NeurIPS ’23
Workshop
On a Foundation Model for Operating Systems
Divyanshu Saxena,
Nihal Sharma,
Donghyun Kim,
Rohit Dwivedula,
Jiayi Chen,
Chenxi Yang,
Sriram Ravula,
Zichao Hu,
Aditya Akella,
Sebastian Angel,
Joydeep Biswas,
Swarat Chaudhuri,
Isil Dillig,
Alex Dimakis,
Brighten P Godfrey,
Daehyeok Kim,
Christopher Rossbach,
and Gang Wang
In 7th Workshop on Machine Learning for Systems. Held at 37th Conference on Neural Information Processing Systems (NeurIPS 2023).
This paper lays down the research agenda for a domain-specific foundation model for operating systems (OSes). Our case for a foundation model revolves around the observations that several OS components such as CPU, memory, and network subsystems are interrelated and that OS traces offer the ideal dataset for a foundation model to grasp the intricacies of diverse OS components and their behavior in varying environments and workloads. We discuss a wide range of possibilities that then arise, from employing foundation models as policy agents to utilizing them as generators and predictors to assist traditional OS control algorithms. Our hope is that this paper spurs further research into OS foundation models and creating the next generation of operating systems for the evolving computing landscape.
SoCC ’23
Conference
Yama: Providing Performance Isolation for Black-Box Offloads
Tao Ji,
Divyanshu Saxena,
Brent E. Stephens,
and Aditya Akella
In Proceedings of the 2023 ACM Symposium on Cloud Computing
The sharing of clusters with various on-NIC offloads by high-level entities (users, containers, etc.) has become increasingly common. Performance isolation across these entities is desired because the offloads can become bottlenecks due to the limited capacity of hardware. However, the existing works that provide scheduling and resource management to NIC offloads all require customization of the NIC or offloads, while commodity off-the-shelf NICs and offloads with proprietary implementation have been widely deployed in datacenters. This paper presents Yama, the first solution to enable per-entity isolation in the sharing of such black-box NIC offloads. Yama provides a generic framework that captures a common abstraction to the operation of most offloads, which allows operators to incorporate existing offloads. The framework proactively probes for the performance of the offloads with auxiliary workload and enforces isolation at the initiator side. Yama also accommodates chained offloads. Our evaluation shows that 1) Yama achieves per-entity max-min fairness for various types of offloads and in complicated offload chaining scenarios; 2) Yama quickly converges to changes in equilibrium and 3) Yama adds negligible overhead to application workload.
Arxiv
Preprint
Dirigo: Self-scaling Stateful Actors For Serverless Real-time Data Processing
We propose Dirigo, a distributed stream processing service built atop virtual actors.
Dirigo achieves both a high level of resource efficiency and performance isolation driven
by user intent (SLO). To improve resource efficiency, Dirigo adopts a serverless architecture
that enables time-sharing of compute resources among streaming operators, both within and
across applications. Meanwhile, Dirigo improves performance isolation by inheriting the
property of function autoscaling from serverless architecture. Specifically, Dirigo
proposes (i) dual-mode actor, an actor abstraction that dynamically provides orderliness
guarantee for streaming operator during autoscaling and (ii) a data plane scheduling
mechanism, along with its API, that allows scheduling and scaling at the message-level
granularity.
In Proceedings of the 5th Workshop on Advanced Tools, Programming Languages, and PLatforms for Implementing and Evaluating Algorithms for Distributed Systems. Held in conjunction with PODC
Distributed applications on the cloud are being developed
and deployed as microservices as opposed to the monolithic architecture.
Service Meshes have emerged as a way of specifying communication
policies between microservices. Service Meshes have the potential
to abstract the networking requirements of distributed applications
from the application logic. However, current service mesh frameworks
introduce significant performance and resource overheads.
We study the overheads of service meshes and make a case for redesigning
both the control plane and data plane for service meshes.
First, we propose the notion of Application Defined Middleboxes,
which makes it possible for the mesh control planes to reduce the
overheads by optimizing where to implement application policies.
Second, we demonstrate preliminary ideas on accelerating the data
plane to further reduce the overheads.
ACM OSR ’23
Journal
Navigating Performance-Efficiency Tradeoffs in Serverless Computing: Deduplication to the Rescue!
Navigating the performance and efficiency trade-offs is critical for serverless platforms, where the providers ideally want to give the illusion of warm function startups while maintaining low resource costs. Limited controls, provided via toggling sandboxes between warm and cold states and keepalives, force operators to sacrifice significant resources to achieve good performance. We present Medes, a serverless framework, that allows operators to navigate the trade-off space smoothly. Our approach
takes advantage of the high duplication in warm sandboxes
on serverless platforms to develop a new sandbox state, called
a ‘dedup state’, that is more memory-efficient than the warm
state and faster to restore from than the cold state. We use
innovative techniques to identify redundancy with minimal
overhead, and provide a simple management policy to balance performance and memory. Our evaluation demonstrates
that Medes can provide up to 3.8X better end-to-end latencies
and reduce the number of cold starts by 10-50% against the
state-of-the-art baselines.
Serverless platforms today impose rigid trade-offs between
resource use and user-perceived performance. Limited controls, provided via toggling sandboxes between warm and
cold states and keep-alives, force operators to sacrifice significant resources to achieve good performance. We present a
serverless framework, Medes, that breaks the rigid trade-off
and allows operators to navigate the trade-off space smoothly.
Medes leverages the fact that the warm sandboxes running
on serverless platforms have a high fraction of duplication in
their memory footprints. We exploit these redundant chunks
to develop a new sandbox state, called a dedup state, that
is more memory-efficient than the warm state and faster to
restore from than the cold state. We develop novel mechanisms to identify memory redundancy at minimal overhead
while ensuring that the dedup containers’ memory footprint
is small. Finally, we develop a simple sandbox management
policy that exposes a narrow, intuitive interface for operators
to trade-off performance for memory by jointly controlling
warm and dedup sandboxes. Detailed experiments with a
protoformat using real-world serverless workloads demonstrate
that Medes can provide up to 1X-2.75X improvements in the
end-to-end latencies. The benefits of Medes are enhanced in
memory pressure situations, where Medes can provide up to
3.8X improvements in end-to-end latencies. Medes achieves
this by reducing the number of cold starts incurred by 10-50%
against the state-of-the-art baselines.