Computer Science
Permanent URI for this collectionhttps://hdl.handle.net/10315/38508
Browse
Recent Submissions
Item type: Item , Access status: Open Access , Deep Learning Models for Detecting Online Harmful Content(2025-11-11) Wei, Feng; Nguyen, Uyen T.Deep learning (DL) has emerged as a transformative technology with substantial impact across various domains, including cybersecurity. This dissertation leverages deep learning methods and their applications to address increasingly sophisticated cyber threats. DL methods are capable of learning complex and abstract features from large-scale data, making them well-suited for identifying and mitigating cyber threats that traditional methods might miss. This dissertation focuses on the practical implementation and evaluation of DL models for detecting real-world cybersecurity threats, namely, clickbait, Twitter bots and SMS spam. Specifically, we propose: - a novel attention-based neural network model named Knowledge-Enhanced Clickbait Detector (KED) that uses linguistic knowledge graphs built from WordNet to guide the attention mechanisms. The proposed neural network can effectively capture discriminative features from local and global similarities via the proposed knowledge-enhanced attention mechanisms. Moreover, we incorporate human semantic knowledge into the neural network and its attention mechanisms to better capture semantic correlations of headline-article word pairs. - a novel recurrent neural network (RNN) model to distinguish Twitter bots from human accounts based on textual content of their tweets. We use several types of linguistic embeddings to encode tweets, namely, word embeddings, character embeddings, part-of-speech embeddings, and named-entity embeddings. We avoid using handcrafted features, which require time-consuming and labor-intensive feature engineering. This advantage allows for faster and easier implementation and deployment of the bot detection scheme. - a novel lightweight deep neural model called Lightweight Gated Recurrent Unit (LGRU) for SMS spam detection. We incorporate enhancing semantics retrieved from external knowledge to assist in understanding SMS text inputs for more accurate detection. In addition, the lightweight model illustrates a method to minimize unnecessary complexity in training recurrent models without compromising the performance, which we believe is applicable to many other complex recurrent models for other applications. Experimental results show that the above models outperform their counterparts, including state-of-the-art models/systems and other baseline models, in terms of predictive performance and/or running time. The proposed models provide robust, scalable, and real-time security solutions that can adapt to the rapidly changing landscape of cyber threats.Item type: Item , Access status: Open Access , Simulation-Based Evaluation of Transaction Finality in Bitcoin Using CNSIM(2025-11-11) Radjou, Amirreza; Liaskos, SotiriosBlockchain consensus protocols must be thoroughly evaluated for security and resilience, but their large scale makes experimental testing in a lab setting challenging. While numerous simulators exist, there is a need for a more general framework that can translate simulation data into useful and comparable metrics. This thesis addresses this gap by adopting CNSim, a simulator developed at York University that introduces a finality-based approach to evaluating consensus networks. To study the Bitcoin protocol, CNSim was enhanced by designing and implementing a novel framework for modeling adversarial behaviors. Specifically, the Majority Attack was implemented to create a detailed simulation for double-spending scenarios. Using this extended simulator, a systematic evaluation was conducted to assess the attack's impact on transaction finality, quantifying how network resilience degrades as malicious hash power increases. The findings provide valuable insights into the practical security limitations of the Bitcoin protocol and successfully demonstrate the utility of a finality-based methodology for analyzing blockchain consensus mechanisms.Item type: Item , Access status: Open Access , Evaluating and Enhancing LLMs for Deep Learning Code Generation with DL-Bench(2025-11-11) DaghighFarsoodeh, Alireza; Pham, Hung VietLarge Language Models (LLMs) have recently demonstrated remarkable capabilities in automated code generation, yet their performance on domain-specific tasks such as deep learning (DL) pipelines remains underexplored. This thesis addresses this gap by introducing DL-Bench, the first comprehensive benchmark dedicated to evaluating LLMs on DL-specific code generation. DL-Bench comprises 520 carefully curated function-level tasks spanning all stages of the machine learning workflow, including data pre- and post-processing, model construction, training, inference, and evaluation, and is systematically categorized by pipeline stage, task type, and input modality. This fine-grained design enables detailed performance analysis and exposes unique challenges of DL code generation, such as tensor shape mismatches, framework-specific errors, and brittle reliance on phrasing. Building on this benchmark, we further investigate robustness strategies for LLMs by proposing a prompt mutation pipeline combined with dual execution agreement. The pipeline systematically generates semantically equivalent prompt variations through lexical, grammatical, and naming transformations, which are then paired with model-generated test cases to diversify candidate solutions. Using a dual agreement framework, correct solutions are identified by their consistent success across test suites, mitigating common misinterpretations. To validate this approach, we evaluate three state-of-the-art LLMs, O4-Mini, DeepSeek R1 Basic, and Gemini 2.5 Pro, exclusively on DL-Bench. Results show that while baseline performance on DL-Bench is substantially lower than on general-purpose benchmarks, prompt mutations consistently yield measurable improvements (up to +2.9% pass@1), demonstrating their value in uncovering alternative correct solutions. Overall, this thesis makes three key contributions: (i) the release of DL-Bench as a domain-specific, fine-grained benchmark for DL code generation, (ii) a systematic analysis of LLM weaknesses in DL contexts supported by a taxonomy of mutation effects, and (iii) the design and evaluation of a mutation-based dual agreement framework that enhances LLM reliability. These contributions provide both practical evaluation tools and methodological insights for advancing LLMs in specialized scientific programming domains. Future directions include scaling DL-Bench with multi-modal tasks, maintaining it as a live benchmark to track recency effects, and incorporating broader metrics such as code efficiency and maintainability.Item type: Item , Access status: Open Access , Multimodal Representation Learning in Medical Using Vision-Language Models(2025-11-11) Baghbanzadeh, Negin; Dolatabadi, ElhamRecent advances in multimodal models such as LLaVA and InstructBLIP highlight the importance of high-quality image encoders, particularly in the biomedical domain where figures and captions are complex. Existing medical vision–language datasets primarily emphasize scale, often overlooking data quality. In this thesis, we introduce OPEN-PMC, a carefully curated collection of biomedical image–text pairs derived from PubMed Central. OPEN-PMC incorporates multiple refinement steps, including compound figure decomposition with a modality-robust detector, subcaption segmentation, in-text reference extraction and summarization, and modality-aware classification. Using OPEN-PMC-2M, we conduct controlled experiments to quantify the effect of each processing step on contrastive pretraining. Our findings show that subfigure decomposition and enriched captions substantially improve retrieval, zero-shot classification, and robustness, outperforming larger but noisier datasets. Scaling to OPEN-PMC-18M, one of the largest curated biomedical VL datasets to date, we demonstrate state-of-the-art performance while discussing remaining limitations in large-scale contextual augmentation and clinical validation.Item type: Item , Access status: Open Access , AdapTrain: Adaptive Model Partitioning for Efficient Independent Subnet Training on Heterogeneous and Dynamic Cloud Infrastructures(2025-11-11) Naderi, Mohammadhossein; Khazaei, HamzehModern distributed training systems face significant challenges in heterogeneous computing environments, where heterogeneity in computational resources among workers often leads to resource underutilization and extended training durations, particularly in resource-constrained environments. To address these challenges, we propose Adaptive Model Partitioning for Efficient Independent Subnet Training on Heterogeneous and Dynamic Cloud Infrastructures (AdapTrain), a novel framework that dynamically adjusts model partitioning to align with the computational capacities of heterogeneous workers. AdapTrain reduces the overhead of synchronization, thereby minimizing total end-to-end training time by ensuring synchronized completion of training rounds across all workers. Its adaptive design enables robust performance under workload variations, inherent resource heterogeneity, and multi-tenancy effects prevalent in cloud computing environments. An experimental evaluation of production workloads reveals that AdapTrain accelerates model convergence by more than 8x compared to the current training methods. Furthermore, AdapTrain integrates seamlessly into existing systems, introducing negligible system performance overhead while significantly enhancing training efficiency.Item type: Item , Access status: Open Access , Deep Generative Models for Trajectory Prediction and Mobility Network Forecasting(2025-07-23) Nadiri, Amirhossein; Papagelis, ManosPredicting human mobility is essential for urban planning, traffic management, and epidemiology. This thesis tackles two intertwined challenges: accurately forecasting individual trajectories and inferring the resulting mobility network. First, we introduce TrajLearn, a Transformer‑based deep generative model that treats trajectories as token sequences and employs spatially constrained beam search to predict each individuals’s next k locations with high precision. Building on these forecasts, we present MobiNetForecast, which constructs and predicts the future topology of the mobility network by detecting when independently predicted trajectories intersect in space and time. Across large, real‑world datasets, our unified framework achieves up to 40% relative gains in trajectory accuracy and up to 100x improvement in contact prediction over state-of-the-art baselines. These results demonstrate that combining advanced sequence modeling with explicit contact inference offers a powerful, scalable solution for dynamic mobility network forecasting.Item type: Item , Access status: Open Access , Machine Unlearning for Mobility Data: An Algorithmic Perspective(2025-07-23) Faraji, Ali; Papagelis, ManosThis work addresses machine unlearning for trajectory data, sequences of spatiotemporal points representing movement. Motivated by growing privacy concerns and regulations like GDPR and CCPA, which grant users the right to request deletion of their personal data from trained models (the right to be forgotten), we propose TraceHiding, an algorithmic framework that removes the influence of specific trajectories without full model retraining. TraceHiding estimates the data point importance and applies gradient updates to reverse it proportionally. The framework includes: (i) Estimating data point importance, (ii) a teacher-student architecture, and (iii) a loss function using Importance Scores to compute reversal gradients. We evaluate TraceHiding on benchmark trajectory classification datasets. Results show it outperforms strong baselines and state-of-the-art unlearning methods (Bad-T, SCRUB, NegGrad, and NegGrad+), effectively removing deleted trajectory influence, preserving retained data performance, and improving efficiency over retraining. To our knowledge, this is the first machine unlearning approach designed specifically for trajectory data.Item type: Item , Access status: Open Access , Evolving Software Ecosystems: The Role of Community Dynamics in Shaping Software Extensions(2025-07-23) Onagh, Elmira; Nayebi, MaleknazAs software ecosystems (SECOs) grow across domains, understanding how tools evolve and differentiate functionally is critical for innovation. This manuscript-based thesis explores the evolution of the software ecosystem and its influence on developers’ motivations to extend their software products in two ecosystems. In the first part, we focus on the evolution of open-source software by analyzing 6,983 GitHub Actions on GitHub Marketplace, revealing a widespread functional redundancy. A graph-based analysis of version histories and release patterns identifies early contributors and offers strategies to reduce duplication and align tools with emerging trends. In the second part, in collaboration with industry partners, we examined proprietary software products, focusing on functional maturity, in particular AI-related features in 116 patient-centric healthcare applications. We find that 86.21% of apps remain in early AI adoption stages, indicating limited advancement toward AI integration. Together, these studies introduce a generalizable, data-driven framework for analyzing functional evolution across domains.Item type: Item , Access status: Open Access , Use of Visual Content for Inference and Response in Q/A Forums(2025-07-23) Ahmed, Faiz; Nayebi, MaleknazIn the rapidly evolving landscape of developer communities, Q&A platforms serve as crucial resources for crowdsourcing developers' knowledge. A notable trend is the increasing use of images to convey complex queries more effectively. However, the current state-of-the-art method of duplicate question detection has not kept pace with this shift, which predominantly concentrates on text-based analysis. Inspired by advancements in image processing and numerous studies in software engineering illustrating the promising future of image-based communication on social coding platforms, we delved into image-based techniques for identifying duplicate questions on Stack Overflow. When focusing solely on text analysis of Stack Overflow questions and omitting the use of images, our automated models overlook a significant aspect of the question. Previous research has demonstrated the complementary nature of images to text. To address this, we implemented two methods of image analysis: first, integrating the text from images into the question text, and second, evaluating the images based on their visual content using image captions. After a rigorous evaluation of our model, it became evident that the efficiency improvements achieved were relatively modest, approximately an average of 1%. This marginal enhancement falls short of what could be deemed a substantial impact. As an encouraging aspect, our work lays the foundation for easy replication and hypothesis validation, allowing future research to build upon our approach and explore novel solutions for more effective image-driven duplicate question detection.Item type: Item , Access status: Open Access , VADViT:Vision Transformer-Driven Memory Forensics for Malicious Process Detection and Explainable Threat Attribution(2025-07-23) Dehfouli, Yasin; Habibi Lashkari, ArashModern malware's increasing complexity limits traditional signature and heuristic-based detection, necessitating advanced memory forensic techniques. Machine learning offers potential but struggles with outdated feature sets, large memory data handling, and forensic explainability. To address these challenges, we propose VADViT, a vision-based transformer model that detects malicious processes by analyzing Virtual Address Descriptor (VAD) memory regions. VADViT converts these structures into Markov, entropy, and intensity-based images, classifying them using a Vision Transformer (ViT) with self-attention to enhance detection accuracy. We also introduce BCCC-MalMem-SnapLog-2025, a dataset logging process identifier (PID) for precise VAD extraction without dynamic analysis. Experimental results show 99% accuracy in binary classification and a 93% macro-average F1 score in multi-class detection. Additionally, attention-based sorting improves forensic analysis by ranking the most relevant malicious VAD regions, narrowing down the search space for forensic investigators.Item type: Item , Access status: Open Access , Application and Optimization of Prompt Engineering Techniques for Code Generation in Large Language Models(2025-07-23) Wang, Chung-Yu; Pham, Hung VietLarge Language Models have demonstrated remarkable capabilities across various domains, particularly in code generation and task-oriented reasoning. However, their accuracy and reliability in generating correct solutions remain a challenge due to the lack of task-specific prior knowledge and the limitations of existing prompt engineering techniques. Current state-of-the-art approaches, such as PAL, rely on manually crafted prompts and examples but often produce suboptimal results. Additionally, while numerous prompt engineering techniques have been developed to improve performance, selecting the most effective technique for a given task remains difficult since different queries exhibit varying levels of complexity. This work presents an integrated approach to enhance the application and optimization of prompt engineering for code generation. First, it introduces TITAN, a novel framework that refines language model reasoning and task execution through step-back and chain of thought prompting. TITAN eliminates the need for extensive manual task-specific instructions by leveraging analytical and code-generation capabilities, achieving state-of-the-art zero-shot performance in multiple tasks. Second, it proposes PET-Select, a prompt engineering agnostic model that classifies queries based on code complexity and dynamically selects the most suitable prompt engineering technique using contrastive learning. This approach enables Pet-Select to optimize prompt selection, leading to improved accuracy and significant reductions in token usage. Comprehensive evaluations across diverse benchmarks, including HumanEval, MBPP, and APPS, demonstrate the effectiveness of TITAN and Pet-Select. TITAN achieves up to 7.6 percent improvement over existing zero-shot methods, while Pet-Select enhances pass@1 accuracy by up to 1.9 percent and reduces token consumption by 49.9 percent. This work represents a significant advancement in optimizing prompt engineering for code generation in large language models, offering a robust and automated solution for improving performance in complex and diverse programming tasks.Item type: Item , Access status: Open Access , Explainability is a Game for Probabilistic Bisimilarity Distances(2025-07-23) Nanah Ji, Anto; van Breugel, FranckSoftware bugs cost trillions annually, requiring better bug detection tools. Testing is widely used but has limitations, especially in non-deterministic software, where code produces different outputs even with fixed inputs due to randomness and concurrency. Labelled Markov chains model randomness but suffer from state space explosion problem, where the number of states grows exponentially with system complexity. One solution is to identify behaviorally equivalent states using probabilistic bisimilarity. However, this method is not robust, small changes in probabilities can affect equivalences. To address this, probabilistic bisimilarity distances were introduced, a quantitative generalization of probabilistic bisimilarity. These distances have game-theoretic characterizations. This thesis illustrates how optimal policies, known as player's strategies, can explain distances. We formulate 1-maximal and 0-minimal policies, argue that they lead to better explanations. We present algorithms for these policies, prove an exponential lower bound for the 1-maximal algorithm, and show that symmetries simplify policies and, hence, explanations.Item type: Item , Access status: Open Access , Guiding Expert Database Tuning with Explainable AI(2025-07-23) Chai, Andrew Brian Frederick; Szlichta, JarekModern database systems, such as IBM Db2, rely on cost-based optimizers to improve workload performance. However, their decision-making processes are difficult to interpret. Tuning them for specific workloads remains challenging due to their complexity, numerous configuration options, and interaction with unique workload characteristics. Additionally, database systems increasingly rely on black-box machine learning models within the optimizer and automatic tuning tools. These black-box models lack interpretability, hindering expert trust and debugging. We propose GEX, a system that provides interpretable insights into database optimizer behavior using explainable AI techniques. We adapt XAI techniques for generating perturbation-based saliency maps from surrogate models to the domain of SQL queries. With GEX we propose a framework for how saliency scores can be used to guide experts in system tuning tasks such as statistical view creation, configuration parameter adjustment, and query rewrite. We demonstrate the ability of GEX to capture and communicate optimizer behaviour through experimental evaluation in these tasks using the TPC-DS benchmark and IBM Db2.Item type: Item , Access status: Open Access , Tuning Big Data Systems Via Deep Learning(2025-07-23) Bianchi, Alexander Robert; Szlichta, JarekModern database systems, including IBM Db2 have numerous parameters, “knobs,” that require precise configuration to achieve optimal workload performance. Even for experts, manually “tuning” these knobs is a challenging process. We present Db2une, an automatic query-aware tuning system that leverages deep learning to maximize performance while minimizing resource usage. Db2une uses a specialized transformer-based query-embedding pipeline and graph neural networks to feed as input to a stability-oriented deep reinforcement learning model. In Db2une, we introduce a multi-phased, database meta-data driven training approach—which incorporates cost estimates, interpolation of these costs, and database statistics—to efficiently discover optimal tuning configurations without the need to execute queries. Thus, our model scales to large workloads where executing queries repeatedly would be prohibitively expensive. Through experimental evaluation, we demonstrate Db2une’s efficiency and effectiveness over a variety of workloads. We show that Db2une provides recommendations surpassing those of other state-of-the-art systems and IBM experts.Item type: Item , Access status: Open Access , Visual Element Property Graphs for Bridging the Symbol Description-Recognition Gap(2025-07-23) Dehnen, Nicholas Alexander; An, AijunThis thesis addresses the semantic gap between visual perception and functional significance of symbols used in road vehicles. It presents a novel approach that enables users to identify and understand automotive symbols by describing what they visually perceive, translating visual descriptions into practical implications. A system combining a property graph representation of visual components and semantic relationships with a language model-powered natural language interface is developed. This method explicitly models relationships between visual elements and interpretations, differing from end-to-end vision-language models. Evaluations, using automated metrics and human assessment, demonstrate performance exceeding baseline large language models, with a BERTscore F1 of 0.765, compared to the best baseline's 0.597. Analysis of visual symbol queries reveals human description tendencies, favoring intuitive analogies and basic shapes. Contributions include a symbol decomposition methodology, an advanced property graph schema, natural language query processing, and evidence supporting structured knowledge representation for symbol description-recognition, applicable beyond automotive interfaces.Item type: Item , Access status: Open Access , Refining the sample complexity of comparative learning(2025-07-23) Rahmanian Ashkezari, Sajad; Urner, RuthThe PAC (Probably Approximately Correct) framework is a well-established theoretical framework for analyzing the statistical (and sometimes computational) complexity of machine learning tasks. Comparative learning is a recently introduced variation of the PAC framework that interpolates between the two standard extreme settings of realizable and agnostic PAC learning. In comparative learning the labeling is assumed to be from one hypothesis class (the source) while the learner's performance is to be measured against another hypothesis class (the benchmark). This setup allows for incorporating more specific prior knowledge into PAC-type learning bounds, which are known to be otherwise overly pessimistic. In this work we study the sample complexity of a variation of this setting we call proper comparative learning where we require the learning algorithm to output a hypothesis from the benchmark class. This setting represents model distillation tasks, where a predictor with specific requirements (e.g., interpretability) is trained on the labels from another model.Item type: Item , Access status: Open Access , SWE-Bench+: Enhanced Coding Benchmark for LLMs(2025-07-23) Aleithan, Reem; Wang, SongLarge Language Models (LLMs) in Software Engineering (SE) can offer valuable assistance for coding tasks. To facilitate a rigorous evaluation of LLMs in practical coding contexts, Carlos et al. introduced the SWE-bench dataset, which comprises 2,294 real-world GitHub issues. Several impressive LLM-based toolkits have recently been developed and evaluated on this dataset. However, a systematic evaluation of the quality of SWE-bench remains missing. In this thesis, we address this gap by presenting an empirical analysis of the SWE-bench dataset. We manually screen instances where SWE-Agent + GPT-4 successfully resolved the issues by comparing model-generated patches with developer-written pull requests. Our analysis reveals two critical issues: (1) 33.47% of patches have solution leakage, where the fix is directly or indirectly revealed in the issue report or comments; and (2) 24.70% of successful patches are suspicious due to weak test cases that fail to detect incorrect, incomplete, or irrelevant fixes. Filtering out these problematic instances drops SWE-Agent + GPT-4’s resolution rate from 12.47% to 4.58%. Motivated by these findings, we propose SWE-Bench+, a refined version of the benchmark using two LLM-based tools: SoluLeakDetector to identify solution-leak issues and TestEnhancer to reduce weak test cases. SWE-Bench+ identifies solution-leak issues with 86% accuracy and reduces suspicious patches by 19%. To reduce the risk of potential data leakage, we collect a new set of post-cutoff GitHub issues. We then evaluate models on this dataset, observing a consistent performance drop across all models. This highlights the impact of solution leakage and weak tests in inflating resolution rates in current benchmarks.Item type: Item , Access status: Open Access , Analyzing Turning Movement Counts at Intersections through Multi-Camera Ground-Plane Reasoning(2025-07-23) Pakdamansavoji, Sajjad; Elder, James HarveyClassifying vehicle trajectories at intersections, known as turning movement counts (TMC), is a critical task for traffic management. Traditional approaches rely on a detect, track, count (DTC) paradigm that employs rule-based methods on image-plane data from a single camera. In this thesis, we propose a novel maximum likelihood approach that operates on the ground plane to perform trajectory classification. Our method demonstrates superior performance compared to image plane techniques and shows promising preliminary results for integrating multi-camera data on ground plane at the counting stage.Item type: Item , Access status: Open Access , Assessing and Enhancing the Quality of News Headlines Using Machine Learning(2025-07-23) Omidvar, Amin; An, AijunHeadlines play a pivotal role in capturing readers' attention, and their quality is critical for engaging audiences. In this thesis, we propose various solutions to assist news media in crafting high-quality headlines. First, we delve into headline quality assessment, devising four innovative indicators that automatically evaluate headlines' quality. Our proposed model empowers news outlets to automatically determine the quality of published headlines. We evaluate the quality of headlines from The Globe and Mail using these four indicators and provide insightful results. We then use this labeled data to train our novel headline quality prediction model to predict the quality of unpublished headlines, assisting journalists in selecting high-quality headlines for their articles. Furthermore, we facilitate journalists' work by recommending high-quality headlines for their articles. To accomplish this, we propose a headline generative model that learns to generate headlines using Reinforcement Learning (RL). Our model can be optimized not only with respect to a non-differentiable metric but also based on a combination of two different metrics simultaneously. Additionally, we enhance headline generation in terms of both training speed and the quality of the generated headlines by proposing a novel architecture utilizing state-of-the-art transformer models. In our architecture, after generating candidate headlines using state-of-the-art models, we select the most popular headline using our headline popularity prediction model. Moreover, we establish a popularity benchmark for evaluating headline generation models based on their ability to generate popular headlines. Lastly, we forecast changes in how people consume news articles, envisioning a shift towards interacting with agents instead of navigating news portals. To address existing challenges and enable this transition, we introduce Semantic In-Context Learning (S-ICL), an innovative approach enabling Large Language Models (LLMs) to deliver updated news in a conversational format, enhancing user engagement and comprehension for news media.Item type: Item , Access status: Open Access , Perception of Materials in Virtual Reality Based on Their Audiovisual Properties(2025-07-23) Koppisetty, Harshitha; Allison, Robert S.This study examined the effects of cue conflicts between auditory and visual material information in a virtual environment. All combinations of impact sounds and visual textures for four materials were paired, creating sixteen conditions. Participants, wearing a VR headset, viewed the rendered target object and heard the paired sound when it was struck with a virtual metal rod. To study the effect of agency, half the trials involved an agent striking the target (agent-interaction), while in the other half, participants struck it themselves (self-interaction). Once they classified the material of the target object, their responses and response times were recorded. Results show that participants relied largely on auditory properties when classifying materials, no significant difference was found between agent-interaction and self-interaction modes, and in four conditions, potential audiovisual illusions were observed. These findings underscore the importance of high-quality auditory cues in VR, as discordant signals can distort perceived material properties.