Proactive & Fine-Grained Monitoring For Microservice Call Chains In Cloud-Native Applications Through Latency Distribution Prediction

Loading...
Thumbnail Image

Date

2025-04-10

Authors

Hussain, Hamza

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Modern cloud-native applications are distributed in nature and have their health monitored through multiple channels. In this study, we propose a singular novel approach that leverages multi-channel monitoring data for fine-grained performance analysis, proactive anomaly prediction, and root-cause analysis in microservices based applications. To this end, we employ Microservice Embeddings, Graph Neural Networks (GNN), and Gated Recurrent Units (GRU) to predict latency distribution, as opposed to a single latency value, for individual calls within a microservice call chain, as well as the distribution of end-to-end latency. Thus, our approach enables deeper insights into system performance and targeted diagnostics for anomalies. We use several benchmark datasets containing anomalies and show that our approach performs consistently across the latency spectrum while outperforming baseline latency prediction approaches by about 6%. Lastly, we show that our approach can be efficiently used to automate the process of trace-based anomaly prediction and perform root-cause analysis.

Description

Keywords

Citation