Computer Science
Permanent URI for this collectionhttps://hdl.handle.net/10315/38508
Browse
Recent Submissions
Item type: Item , Access status: Open Access , Deep Learning Models for Detecting Online Harmful Content(2025-11-11) Wei, Feng; Nguyen, Uyen T.Deep learning (DL) has emerged as a transformative technology with substantial impact across various domains, including cybersecurity. This dissertation leverages deep learning methods and their applications to address increasingly sophisticated cyber threats. DL methods are capable of learning complex and abstract features from large-scale data, making them well-suited for identifying and mitigating cyber threats that traditional methods might miss. This dissertation focuses on the practical implementation and evaluation of DL models for detecting real-world cybersecurity threats, namely, clickbait, Twitter bots and SMS spam. Specifically, we propose: - a novel attention-based neural network model named Knowledge-Enhanced Clickbait Detector (KED) that uses linguistic knowledge graphs built from WordNet to guide the attention mechanisms. The proposed neural network can effectively capture discriminative features from local and global similarities via the proposed knowledge-enhanced attention mechanisms. Moreover, we incorporate human semantic knowledge into the neural network and its attention mechanisms to better capture semantic correlations of headline-article word pairs. - a novel recurrent neural network (RNN) model to distinguish Twitter bots from human accounts based on textual content of their tweets. We use several types of linguistic embeddings to encode tweets, namely, word embeddings, character embeddings, part-of-speech embeddings, and named-entity embeddings. We avoid using handcrafted features, which require time-consuming and labor-intensive feature engineering. This advantage allows for faster and easier implementation and deployment of the bot detection scheme. - a novel lightweight deep neural model called Lightweight Gated Recurrent Unit (LGRU) for SMS spam detection. We incorporate enhancing semantics retrieved from external knowledge to assist in understanding SMS text inputs for more accurate detection. In addition, the lightweight model illustrates a method to minimize unnecessary complexity in training recurrent models without compromising the performance, which we believe is applicable to many other complex recurrent models for other applications. Experimental results show that the above models outperform their counterparts, including state-of-the-art models/systems and other baseline models, in terms of predictive performance and/or running time. The proposed models provide robust, scalable, and real-time security solutions that can adapt to the rapidly changing landscape of cyber threats.Item type: Item , Access status: Open Access , Simulation-Based Evaluation of Transaction Finality in Bitcoin Using CNSIM(2025-11-11) Radjou, Amirreza; Liaskos, SotiriosBlockchain consensus protocols must be thoroughly evaluated for security and resilience, but their large scale makes experimental testing in a lab setting challenging. While numerous simulators exist, there is a need for a more general framework that can translate simulation data into useful and comparable metrics. This thesis addresses this gap by adopting CNSim, a simulator developed at York University that introduces a finality-based approach to evaluating consensus networks. To study the Bitcoin protocol, CNSim was enhanced by designing and implementing a novel framework for modeling adversarial behaviors. Specifically, the Majority Attack was implemented to create a detailed simulation for double-spending scenarios. Using this extended simulator, a systematic evaluation was conducted to assess the attack's impact on transaction finality, quantifying how network resilience degrades as malicious hash power increases. The findings provide valuable insights into the practical security limitations of the Bitcoin protocol and successfully demonstrate the utility of a finality-based methodology for analyzing blockchain consensus mechanisms.Item type: Item , Access status: Open Access , Evaluating and Enhancing LLMs for Deep Learning Code Generation with DL-Bench(2025-11-11) DaghighFarsoodeh, Alireza; Pham, Hung VietLarge Language Models (LLMs) have recently demonstrated remarkable capabilities in automated code generation, yet their performance on domain-specific tasks such as deep learning (DL) pipelines remains underexplored. This thesis addresses this gap by introducing DL-Bench, the first comprehensive benchmark dedicated to evaluating LLMs on DL-specific code generation. DL-Bench comprises 520 carefully curated function-level tasks spanning all stages of the machine learning workflow, including data pre- and post-processing, model construction, training, inference, and evaluation, and is systematically categorized by pipeline stage, task type, and input modality. This fine-grained design enables detailed performance analysis and exposes unique challenges of DL code generation, such as tensor shape mismatches, framework-specific errors, and brittle reliance on phrasing. Building on this benchmark, we further investigate robustness strategies for LLMs by proposing a prompt mutation pipeline combined with dual execution agreement. The pipeline systematically generates semantically equivalent prompt variations through lexical, grammatical, and naming transformations, which are then paired with model-generated test cases to diversify candidate solutions. Using a dual agreement framework, correct solutions are identified by their consistent success across test suites, mitigating common misinterpretations. To validate this approach, we evaluate three state-of-the-art LLMs, O4-Mini, DeepSeek R1 Basic, and Gemini 2.5 Pro, exclusively on DL-Bench. Results show that while baseline performance on DL-Bench is substantially lower than on general-purpose benchmarks, prompt mutations consistently yield measurable improvements (up to +2.9% pass@1), demonstrating their value in uncovering alternative correct solutions. Overall, this thesis makes three key contributions: (i) the release of DL-Bench as a domain-specific, fine-grained benchmark for DL code generation, (ii) a systematic analysis of LLM weaknesses in DL contexts supported by a taxonomy of mutation effects, and (iii) the design and evaluation of a mutation-based dual agreement framework that enhances LLM reliability. These contributions provide both practical evaluation tools and methodological insights for advancing LLMs in specialized scientific programming domains. Future directions include scaling DL-Bench with multi-modal tasks, maintaining it as a live benchmark to track recency effects, and incorporating broader metrics such as code efficiency and maintainability.Item type: Item , Access status: Open Access , Multimodal Representation Learning in Medical Using Vision-Language Models(2025-11-11) Baghbanzadeh, Negin; Dolatabadi, ElhamRecent advances in multimodal models such as LLaVA and InstructBLIP highlight the importance of high-quality image encoders, particularly in the biomedical domain where figures and captions are complex. Existing medical vision–language datasets primarily emphasize scale, often overlooking data quality. In this thesis, we introduce OPEN-PMC, a carefully curated collection of biomedical image–text pairs derived from PubMed Central. OPEN-PMC incorporates multiple refinement steps, including compound figure decomposition with a modality-robust detector, subcaption segmentation, in-text reference extraction and summarization, and modality-aware classification. Using OPEN-PMC-2M, we conduct controlled experiments to quantify the effect of each processing step on contrastive pretraining. Our findings show that subfigure decomposition and enriched captions substantially improve retrieval, zero-shot classification, and robustness, outperforming larger but noisier datasets. Scaling to OPEN-PMC-18M, one of the largest curated biomedical VL datasets to date, we demonstrate state-of-the-art performance while discussing remaining limitations in large-scale contextual augmentation and clinical validation.Item type: Item , Access status: Open Access , AdapTrain: Adaptive Model Partitioning for Efficient Independent Subnet Training on Heterogeneous and Dynamic Cloud Infrastructures(2025-11-11) Naderi, Mohammadhossein; Khazaei, HamzehModern distributed training systems face significant challenges in heterogeneous computing environments, where heterogeneity in computational resources among workers often leads to resource underutilization and extended training durations, particularly in resource-constrained environments. To address these challenges, we propose Adaptive Model Partitioning for Efficient Independent Subnet Training on Heterogeneous and Dynamic Cloud Infrastructures (AdapTrain), a novel framework that dynamically adjusts model partitioning to align with the computational capacities of heterogeneous workers. AdapTrain reduces the overhead of synchronization, thereby minimizing total end-to-end training time by ensuring synchronized completion of training rounds across all workers. Its adaptive design enables robust performance under workload variations, inherent resource heterogeneity, and multi-tenancy effects prevalent in cloud computing environments. An experimental evaluation of production workloads reveals that AdapTrain accelerates model convergence by more than 8x compared to the current training methods. Furthermore, AdapTrain integrates seamlessly into existing systems, introducing negligible system performance overhead while significantly enhancing training efficiency.Item type: Item , Access status: Open Access , On the Complexity of Telephone Broadcasting: From Cacti to Bounded Pathwidth Graphs(2025-11-11) Seyed Javadi, Seyed Mohammad; Kamali, ShahinIn Telephone Broadcasting, the goal is to disseminate a message from a given source vertex of an input graph to all other vertices in the minimum number of rounds, where at each round, an informed vertex can send the message to at most one of its uninformed neighbors. We study the problem in cactus graphs and graphs of bounded pathwidth. Despite many previous efforts, the complexity of the problem in cactus graphs remained open. We settle this question by establishing the NP-completeness of broadcasting in cactus graphs and graphs of pathwdith 2. On the positive side, we present constant-factor approximation algorithms for the studied families of graphs, namely, an algorithm with an approximation factor of 2 for cactus graphs and an approximation factor of O(1) for graphs of constant pathwidth.Item type: Item , Access status: Open Access , Fitness-Based Recommender Systems for Reducing Sedentary Behaviour(2025-11-11) Toyonaga, Shogo Kai; Oyibo, KiemuteObesity and sedentary behaviour represent one of the greatest global challenges to good health and wellbeing. The goal of the thesis is to promote physical activity among young adults by comparing the effectiveness of content-based and context-aware recommender systems on perceived post-intervention user experience, exercise motivation, and projected behaviour performance. Gender differences are explored. A 73-person user study compares recommender systems that solely focus on generating fitness plans (control group) against alternatives that incorporate psychosocial frameworks and explainability into the generation process (experimental group). The context-aware recommender systems provided the highest level of perceived post-intervention user experience, exercise motivation, and projected behaviour performance compared to the content-based recommender systems. Among females, the experimental group which leveraged persuasive design techniques showed numerical gains in exercise motivation and projected behaviour performance compared to the control group, however, the interaction effect was non-significant. Future work should investigate hybrid recommender systems in generating personalized exercise recommendations.Item type: Item , Access status: Open Access , Toward Trustworthy Automated Data Story Generation: Benchmarking, Multi-Agent Generation and Bias Evaluation in Data Storytelling(2025-11-11) Islam, Mohammed Saidul; Prince, Enamul HoqueData-driven storytelling is a powerful method for conveying insights by combining narrative techniques with visualizations and text. In this thesis, we introduce a novel task for data story generation and a benchmark containing 1,449 stories from diverse sources. We propose a multi-step LLM agent framework mimicking the human storytelling process: one for planning and narration, and another for verification at each intermediary step. Results show that our proposed framework significantly outperforms non-agentic baselines. In parallel, we recognize that trustworthy storytelling must also be fair and unbiased. To this end, we conduct a largescale empirical study to uncover systematic geo-economic bias in the foundational subtask of data storytelling: producing narrative summaries of charts. We further explore inference-time debiasing strategies and highlight the need for more robust bias mitigation methods. Together, these contributions provide both a powerful generative system and a fairness-focused evaluation to ensure automated data storytelling is accurate, coherent, and ethically responsible.Item type: Item , Access status: Open Access , Data Layout Recommendation for Big Data Systems via Large Language Models(2025-11-11) So, Justin Chun Hei; Szlichta, JarekThe physical layout of data is critical to the performance of analytical queries, especially in column-store systems like IBM Db2. Among layout strategies, Z-ordering is a popular technique that maps multi-dimensional data to a one-dimensional space while preserving locality. However, tuning Z-order is challenging: users must manually select the columns to include, and most systems assign equal weight to each column, ignoring the varying impact of different columns on query performance. We present LayZ, an LLM-directed advisor for automated data layout tuning in IBM Db2. LayZ analyzes SQL workloads to extract query execution plan features and creates compact prompts that preserve layout-relevant information, thereby reducing inference cost when using large language models. LayZ generates ranked layout configurations, including weighted Z-orderings that adapt bit allocations based on workload characteristics. These configurations are evaluated using a cost model to identify the best candidate layout for the target workload. Our system supports both base tables and materialized views, enabling performance recovery in queries that regress under global physical design. Experimental results on the DSB workload show that LayZ outperforms heuristic and existing layout strategies, improving query performance by up to 90%.Item type: Item , Access status: Open Access , Algorithms for Timely Bin Packing, Fair TCP Acknowledgement, and Variants(2025-11-11) Aminian, Aida; Kamali, ShahinThe Timely Bin Packing problem is a variant of bin packing, which incorporates time constraints for packing items. This is related to the classic Dynamic TCP Acknowledgement problem, which has been extensively studied and has real-world applications. This thesis studies new algorithms and settings for these two related problems. For Timely Bin Packing, we present deterministic and randomized algorithms that improve the best-known competitive ratios. We further provide impossibility results for different settings and variants of the problem. We also study the problem under certain assumptions, such as having restricted sizes and integer arrival times. For Dynamic TCP Acknowledgement, we study relaxed settings that include machine-learned predictions. We consider the fmax objective, which balances the max-latency of acknowledged packets (thus conveying a notion of max-min fairness). For this setting, we study learning-augmented algorithms and present experimental results on both synthetic and real-world data.Item type: Item , Access status: Open Access , Algorithmic Solutions for Interval and Rectangle Scheduling: Fairness, Prediction, and Beyond(2025-11-11) Zhu, Wenhao; Kamali, ShahinIn the interval scheduling problem, the input is a set of intervals with integer endpoints, and the objective is to accept a maximum number of non-overlapping intervals. In the two-dimensional variant, namely rectangle scheduling, the input consists of a set of rectangles, and the goal is to select a maximum number of non-overlapping rectangles. One may consider these problems in both online and offline settings. In the offline setting, the input set is available at the beginning, and an algorithm makes a decision about selecting an interval (or rectangle) with complete information about the input. In the online setting, however, the input appears sequentially, and an online algorithm must accept an interval (or a rectangle) upon its arrival without any information about forthcoming intervals (or rectangles). The decisions of an algorithm for accepting/rejecting an item in its selected set are final and irrevocable. Online interval scheduling has received considerable attention in the past few decades, and various algorithmic solutions have been proposed under different settings and models. In comparison, two-dimensional variants of the problem have been less studied. In this thesis, we study the online rectangle scheduling problems under the any-order arrival setting. Previous work on this topic has been limited to random-order arrival (where rectangles are generated in parallel, but their ordering is random) or under restrictions like square scheduling. In comparison, we study the worst-case performance of online algorithms in the most generic setting of the problem and establish the tight upper and lower bounds of Theta((log T)^2) on the competitive ratio of the best online algorithms. Here, T is a parameter that defines the maximum length of intervals. Furthermore, we consider an online rectangle scheduling setting where (possibly erroneous) predictions are provided to the online algorithm. Here, predictions specify the presence or absence of rectangles in the input. Under this setting, we study an algorithm whose competitive ratio depends explicitly on the prediction error. In particular, our algorithm performs optimally when the predictions are perfect, and its performance degrades as the error increases. To make this algorithm robust against adversarial predictions (where error is large), we also propose a hybrid approach that combines it with a purely online algorithm and prove that this combined approach allows a trade-off between "consistency" and "robustness", which define the performance in scenarios where predictions are perfect and adversarial, respectively. In addition, we initiate the study of the fairness aspects of the interval scheduling problem, where each interval belongs to a group, and the objective is to achieve a notion of fairness in which a "fair" number of intervals from each group are accepted. We study this problem under both absolute and asymptotic settings. In the asymptotic setting, the number of intervals in optimal solutions for each group is arbitrarily large, whereas this assumption is absent in the absolute setting. For each of these two settings, we study the power of offline and online algorithms, as well as deterministic and randomized algorithms.Item type: Item , Access status: Open Access , A Unified Framework for High-Frame-Rate High-Dynamic-Range Video Synthesis(2025-11-11) Nguyen, Thi Hue; Brown, Michael S.Creating high-dynamic-range (HDR) video at high-frame rates is a technically demanding and application-critical problem, particularly in domains such as cinematography and autonomous perception. The challenge arises from the limitations of conventional image sensors in capturing both temporal and radiometric fidelity. This work introduces a unified framework that jointly addresses HDR reconstruction and temporal interpolation from sequences captured with alternating exposures. In contrast to prior methods that focus only on middle-frame interpolation or rely on computationally intensive pipelines, our approach employs a lightweight, end-to-end network capable of generating HDR frames at arbitrary timesteps in real time on mid-range GPUs. To mitigate the need for ground-truth HDR video, we propose a novel self-supervised training paradigm that leverages reconstruction objectives designed to preserve both photometric accuracy and temporal coherence. Experimental results demonstrate that our framework not only maintains competitive visual fidelity but also significantly reduces computational overhead compared to state-of-the-art baselines.Item type: Item , Access status: Open Access , Deep Generative Models for Trajectory Prediction and Mobility Network Forecasting(2025-07-23) Nadiri, Amirhossein; Papagelis, ManosPredicting human mobility is essential for urban planning, traffic management, and epidemiology. This thesis tackles two intertwined challenges: accurately forecasting individual trajectories and inferring the resulting mobility network. First, we introduce TrajLearn, a Transformer‑based deep generative model that treats trajectories as token sequences and employs spatially constrained beam search to predict each individuals’s next k locations with high precision. Building on these forecasts, we present MobiNetForecast, which constructs and predicts the future topology of the mobility network by detecting when independently predicted trajectories intersect in space and time. Across large, real‑world datasets, our unified framework achieves up to 40% relative gains in trajectory accuracy and up to 100x improvement in contact prediction over state-of-the-art baselines. These results demonstrate that combining advanced sequence modeling with explicit contact inference offers a powerful, scalable solution for dynamic mobility network forecasting.Item type: Item , Access status: Open Access , Machine Unlearning for Mobility Data: An Algorithmic Perspective(2025-07-23) Faraji, Ali; Papagelis, ManosThis work addresses machine unlearning for trajectory data, sequences of spatiotemporal points representing movement. Motivated by growing privacy concerns and regulations like GDPR and CCPA, which grant users the right to request deletion of their personal data from trained models (the right to be forgotten), we propose TraceHiding, an algorithmic framework that removes the influence of specific trajectories without full model retraining. TraceHiding estimates the data point importance and applies gradient updates to reverse it proportionally. The framework includes: (i) Estimating data point importance, (ii) a teacher-student architecture, and (iii) a loss function using Importance Scores to compute reversal gradients. We evaluate TraceHiding on benchmark trajectory classification datasets. Results show it outperforms strong baselines and state-of-the-art unlearning methods (Bad-T, SCRUB, NegGrad, and NegGrad+), effectively removing deleted trajectory influence, preserving retained data performance, and improving efficiency over retraining. To our knowledge, this is the first machine unlearning approach designed specifically for trajectory data.Item type: Item , Access status: Open Access , Evolving Software Ecosystems: The Role of Community Dynamics in Shaping Software Extensions(2025-07-23) Onagh, Elmira; Nayebi, MaleknazAs software ecosystems (SECOs) grow across domains, understanding how tools evolve and differentiate functionally is critical for innovation. This manuscript-based thesis explores the evolution of the software ecosystem and its influence on developers’ motivations to extend their software products in two ecosystems. In the first part, we focus on the evolution of open-source software by analyzing 6,983 GitHub Actions on GitHub Marketplace, revealing a widespread functional redundancy. A graph-based analysis of version histories and release patterns identifies early contributors and offers strategies to reduce duplication and align tools with emerging trends. In the second part, in collaboration with industry partners, we examined proprietary software products, focusing on functional maturity, in particular AI-related features in 116 patient-centric healthcare applications. We find that 86.21% of apps remain in early AI adoption stages, indicating limited advancement toward AI integration. Together, these studies introduce a generalizable, data-driven framework for analyzing functional evolution across domains.Item type: Item , Access status: Open Access , Use of Visual Content for Inference and Response in Q/A Forums(2025-07-23) Ahmed, Faiz; Nayebi, MaleknazIn the rapidly evolving landscape of developer communities, Q&A platforms serve as crucial resources for crowdsourcing developers' knowledge. A notable trend is the increasing use of images to convey complex queries more effectively. However, the current state-of-the-art method of duplicate question detection has not kept pace with this shift, which predominantly concentrates on text-based analysis. Inspired by advancements in image processing and numerous studies in software engineering illustrating the promising future of image-based communication on social coding platforms, we delved into image-based techniques for identifying duplicate questions on Stack Overflow. When focusing solely on text analysis of Stack Overflow questions and omitting the use of images, our automated models overlook a significant aspect of the question. Previous research has demonstrated the complementary nature of images to text. To address this, we implemented two methods of image analysis: first, integrating the text from images into the question text, and second, evaluating the images based on their visual content using image captions. After a rigorous evaluation of our model, it became evident that the efficiency improvements achieved were relatively modest, approximately an average of 1%. This marginal enhancement falls short of what could be deemed a substantial impact. As an encouraging aspect, our work lays the foundation for easy replication and hypothesis validation, allowing future research to build upon our approach and explore novel solutions for more effective image-driven duplicate question detection.Item type: Item , Access status: Open Access , VADViT:Vision Transformer-Driven Memory Forensics for Malicious Process Detection and Explainable Threat Attribution(2025-07-23) Dehfouli, Yasin; Habibi Lashkari, ArashModern malware's increasing complexity limits traditional signature and heuristic-based detection, necessitating advanced memory forensic techniques. Machine learning offers potential but struggles with outdated feature sets, large memory data handling, and forensic explainability. To address these challenges, we propose VADViT, a vision-based transformer model that detects malicious processes by analyzing Virtual Address Descriptor (VAD) memory regions. VADViT converts these structures into Markov, entropy, and intensity-based images, classifying them using a Vision Transformer (ViT) with self-attention to enhance detection accuracy. We also introduce BCCC-MalMem-SnapLog-2025, a dataset logging process identifier (PID) for precise VAD extraction without dynamic analysis. Experimental results show 99% accuracy in binary classification and a 93% macro-average F1 score in multi-class detection. Additionally, attention-based sorting improves forensic analysis by ranking the most relevant malicious VAD regions, narrowing down the search space for forensic investigators.Item type: Item , Access status: Open Access , Application and Optimization of Prompt Engineering Techniques for Code Generation in Large Language Models(2025-07-23) Wang, Chung-Yu; Pham, Hung VietLarge Language Models have demonstrated remarkable capabilities across various domains, particularly in code generation and task-oriented reasoning. However, their accuracy and reliability in generating correct solutions remain a challenge due to the lack of task-specific prior knowledge and the limitations of existing prompt engineering techniques. Current state-of-the-art approaches, such as PAL, rely on manually crafted prompts and examples but often produce suboptimal results. Additionally, while numerous prompt engineering techniques have been developed to improve performance, selecting the most effective technique for a given task remains difficult since different queries exhibit varying levels of complexity. This work presents an integrated approach to enhance the application and optimization of prompt engineering for code generation. First, it introduces TITAN, a novel framework that refines language model reasoning and task execution through step-back and chain of thought prompting. TITAN eliminates the need for extensive manual task-specific instructions by leveraging analytical and code-generation capabilities, achieving state-of-the-art zero-shot performance in multiple tasks. Second, it proposes PET-Select, a prompt engineering agnostic model that classifies queries based on code complexity and dynamically selects the most suitable prompt engineering technique using contrastive learning. This approach enables Pet-Select to optimize prompt selection, leading to improved accuracy and significant reductions in token usage. Comprehensive evaluations across diverse benchmarks, including HumanEval, MBPP, and APPS, demonstrate the effectiveness of TITAN and Pet-Select. TITAN achieves up to 7.6 percent improvement over existing zero-shot methods, while Pet-Select enhances pass@1 accuracy by up to 1.9 percent and reduces token consumption by 49.9 percent. This work represents a significant advancement in optimizing prompt engineering for code generation in large language models, offering a robust and automated solution for improving performance in complex and diverse programming tasks.Item type: Item , Access status: Open Access , Explainability is a Game for Probabilistic Bisimilarity Distances(2025-07-23) Nanah Ji, Anto; van Breugel, FranckSoftware bugs cost trillions annually, requiring better bug detection tools. Testing is widely used but has limitations, especially in non-deterministic software, where code produces different outputs even with fixed inputs due to randomness and concurrency. Labelled Markov chains model randomness but suffer from state space explosion problem, where the number of states grows exponentially with system complexity. One solution is to identify behaviorally equivalent states using probabilistic bisimilarity. However, this method is not robust, small changes in probabilities can affect equivalences. To address this, probabilistic bisimilarity distances were introduced, a quantitative generalization of probabilistic bisimilarity. These distances have game-theoretic characterizations. This thesis illustrates how optimal policies, known as player's strategies, can explain distances. We formulate 1-maximal and 0-minimal policies, argue that they lead to better explanations. We present algorithms for these policies, prove an exponential lower bound for the 1-maximal algorithm, and show that symmetries simplify policies and, hence, explanations.Item type: Item , Access status: Open Access , Guiding Expert Database Tuning with Explainable AI(2025-07-23) Chai, Andrew Brian Frederick; Szlichta, JarekModern database systems, such as IBM Db2, rely on cost-based optimizers to improve workload performance. However, their decision-making processes are difficult to interpret. Tuning them for specific workloads remains challenging due to their complexity, numerous configuration options, and interaction with unique workload characteristics. Additionally, database systems increasingly rely on black-box machine learning models within the optimizer and automatic tuning tools. These black-box models lack interpretability, hindering expert trust and debugging. We propose GEX, a system that provides interpretable insights into database optimizer behavior using explainable AI techniques. We adapt XAI techniques for generating perturbation-based saliency maps from surrogate models to the domain of SQL queries. With GEX we propose a framework for how saliency scores can be used to guide experts in system tuning tasks such as statistical view creation, configuration parameter adjustment, and query rewrite. We demonstrate the ability of GEX to capture and communicate optimizer behaviour through experimental evaluation in these tasks using the TPC-DS benchmark and IBM Db2.