OptiServe: Cost-Aware, Performance-Driven, and Accuracy-Tuned Serverless Applications with ML Workloads
| dc.contributor.advisor | Khazaei, Hamzeh | |
| dc.contributor.author | Boukani, Arian | |
| dc.date.accessioned | 2026-03-10T16:20:39Z | |
| dc.date.available | 2026-03-10T16:20:39Z | |
| dc.date.copyright | 2026-01-29 | |
| dc.date.issued | 2026-03-10 | |
| dc.date.updated | 2026-03-10T16:20:39Z | |
| dc.degree.discipline | Computer Science | |
| dc.degree.level | Master's | |
| dc.degree.name | MSc - Master of Science | |
| dc.description.abstract | Serverless computing has emerged as a popular cloud paradigm due to its seamless scalability and cost-efficient, pay-as-you-go pricing model. Its potential to support machine learning (ML) inference workloads,including generative AI tasks, has led to growing adoption of ML functions within serverless applications. A key challenge, however, is selecting suitable ML models that balance execution time, deployment cost, and inference accuracy in latency- and cost-sensitive environments. In this study, we present a framework for optimizing serverless applications that incorporate ML components through tri-objective optimization. We develop high-fidelity analytical models, augmented with lightweight profiling, to capture the trade-offs among cost, performance, and accuracy across different model choices. These models serve as the foundation for guiding ML model selection and deployment strategies to meet application-specific service-level objectives. We validate our framework through real-world experiments on AWS using real serverless applications. Furthermore, we demonstrate its practicality by performing extensive what-if analyses, exploring a wide range of application scenarios and configurations, in under a minute. Our extensive experiments on real-world applications show that OptiServe recommends memory and ML model configurations that achieve over 95% of the accuracy of ideal configurations in 89.64% of cases, enabling efficient, low-cost deployments while maintaining model accuracy and meeting performance targets. | |
| dc.identifier.uri | https://hdl.handle.net/10315/43652 | |
| dc.language | en | |
| dc.rights | Author owns copyright, except where explicitly noted. Please contact the author directly with licensing requests. | |
| dc.subject | Computer science | |
| dc.subject.keywords | Serverless computing | |
| dc.subject.keywords | ML inference | |
| dc.subject.keywords | Function and application modelling | |
| dc.subject.keywords | Multi-objective optimization | |
| dc.title | OptiServe: Cost-Aware, Performance-Driven, and Accuracy-Tuned Serverless Applications with ML Workloads | |
| dc.type | Electronic Thesis or Dissertation |
Files
Original bundle
1 - 1 of 1