OptiServe: Cost-Aware, Performance-Driven, and Accuracy-Tuned Serverless Applications with ML Workloads

Loading...
Thumbnail Image

Authors

Boukani, Arian

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Serverless computing has emerged as a popular cloud paradigm due to its seamless scalability and cost-efficient, pay-as-you-go pricing model. Its potential to support machine learning (ML) inference workloads,including generative AI tasks, has led to growing adoption of ML functions within serverless applications. A key challenge, however, is selecting suitable ML models that balance execution time, deployment cost, and inference accuracy in latency- and cost-sensitive environments.

In this study, we present a framework for optimizing serverless applications that incorporate ML components through tri-objective optimization. We develop high-fidelity analytical models, augmented with lightweight profiling, to capture the trade-offs among cost, performance, and accuracy across different model choices. These models serve as the foundation for guiding ML model selection and deployment strategies to meet application-specific service-level objectives.

We validate our framework through real-world experiments on AWS using real serverless applications. Furthermore, we demonstrate its practicality by performing extensive what-if analyses, exploring a wide range of application scenarios and configurations, in under a minute. Our extensive experiments on real-world applications show that OptiServe recommends memory and ML model configurations that achieve over 95% of the accuracy of ideal configurations in 89.64% of cases, enabling efficient, low-cost deployments while maintaining model accuracy and meeting performance targets.

Description

Keywords

Computer science

Citation

Collections