EAGLE-APT: Edge-Aware Provenance Graph Learning with Node Encoding for Advanced Persistent Threat Detection and Attribution from System Audit Log
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Advanced Persistent Threats (APTs) represent some of the most challenging forms of cyberattacks, characterized by stealth, persistence, and multi-stage operations that evade traditional defenses. Detecting and attributing such campaigns to a known APT group requires methods that can capture long-term coordinated malicious activity within complex system interactions. This research introduces EAGLE-APT, an Edge-Aware Provenance Graph Learning framework with Node Encoding for APT detection and attribution from system audit logs. The proposed architecture comprises five core components: a provenance graph generator, a node feature extractor, a type-specific feature encoder, a malicious node detector, and an attribution module. The process begins with the provenance graph generator, which converts raw audit logs into heterogeneous provenance graphs that capture system entities and their causal relationships. These graphs are then enriched by the node feature extractor, which incorporates both semantic and structural information to represent the behavior of each entity more effectively. Next, the type-specific feature encoder transforms heterogeneous node features into a unified embedding space, ensuring that diverse data types contribute meaningfully to the representation. Building on this foundation, the malicious node detector utilizes an edge-aware graph neural network to identify suspicious nodes, taking into account both the contextual importance of neighbors and the nature of their connections. Finally, the attribution module analyzes the detected malicious subgraphs and classifies them into known APT groups, offering a foundation for informed response and defense strategies. To support evaluation, a comprehensive dataset of simulated APT campaigns was generated in a controlled enterprise environment, capturing realistic multi-stage attack behaviors. Together, these contributions provide both a novel framework for end-to-end detection and attribution and a reproducible dataset that can serve as a basis for advancing future research in APT defense.