A Graph-Based Deep Learning Model for Anti-Money Laundering
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Anti-money laundering (AML) refers to a set of laws, regulations, and procedures that financial institutions and other regulated entities are required to implement to identify and prevent the use of their services for illicit financial activities. Current AML solutions rely on rule-based algorithms, which are not scalable and ineffective against new, evolving or complex money laundering patterns. On the other hand, the rapid advancement of technology and new sophisticated financial instruments have increased the complexity of money laundering methods. Machine learning has the capability to learn and identify new or complex money laundering patterns. Within this context, the thesis offers two major contributions. First, we conducted a survey that provides a comprehensive review of existing machine learning-based AML solutions from a data-oriented perspective. We studied existing machine learning models proposed for AML in terms of datasets used, input and output data, approaches to the class imbalance problem, and classification metrics. To the best of our knowledge, this survey is the first that focuses on different aspects of data, classification metrics and related issues (e.g., the class imbalance problem). Second, we propose an AML detection system and a graph-based machine learning model to identify suspicious transactions. The detection system first transforms a dataset of accounts and transactions into a graph structure and applies the node2vec (N2V) algorithm to convert the graphs into feature vectors. The feature vectors are then input into a graph convolution network (GCN), which will then classify the transactions as normal or suspicious. (Each suspicious transaction, which is known as an alarm, will be investigated manually by a financial analyst to confirm if it is a normal transaction or a money laundering transaction.) To overcome the inherent class imbalance of AML data (i.e., the number of money laundering transactions in a dataset is much smaller than the number of normal transactions), we use a combination of techniques, including over-sampling and classifier threshold moving. Our experimental results show that the proposed N2V-GCN system can achieve very low false negative rates (money laundering transactions misclassified as normal transactions), reaching zero in one experiment. At the same time, the proposed system lowers the false alarm rates (normal transactions classified as suspicious transactions) to under 50%, much lower than the current industry standard of 90% or more.