Computer Science

Permanent URI for this collectionhttps://hdl.handle.net/10315/38508

Browse

Recent Submissions

Now showing 1 - 20 of 135
  • Item type: Item , Access status: Open Access ,
    Inflammatory Biomarker Analysis from Wearable Sweat Patches via Smartphone-Based Image Processing
    (2026-03-10) Rozenblat, Shahak; Salahandish, Neda
    The detection of systemic inflammation through inflammatory biomarkers plays a critical role in identifying and managing pathological conditions. Conventional measurement of inflammatory biomarkers relies on invasive procedures such as blood sampling, which limits accessibility and requires frequent monitoring. Wearable sweat sensors offer a promising noninvasive alternative; however, robust interpretation of their visual signals remains challenging outside of laboratory environments. This study presents the first fully automated computational pipeline that translates colorimetric signals from a wearable sweat sensor into quantitative measurements of inflammatory biomarkers using smartphone-acquired images. The proposed approach enables reliable analysis under variable imaging conditions, supporting point-of-care (POC) inflammatory monitoring. Our results show that the pipeline significantly reduces measurement variability, achieving up to a 70\% reduction in variability that may be induced by different lighting conditions. Additional experiments demonstrate robustness across different smartphones and image capture distances with end-to-end processing completed within a few seconds. Furthermore, validation using data from human participants with eczema demonstrates that the system can distinguish between healthy individuals and those exhibiting elevated inflammatory biomarker levels, with performance comparable to the gold-standard of enzyme-linked immunosorbent assay (ELISA). The complete pipeline was integrated into a mobile application, enabling near real-time analysis and supporting practical POC deployment.
  • Item type: Item , Access status: Open Access ,
    PAMBA: Partition-aware and Multi-SLO Batching for Serverless Inference on Heterogeneous Clouds
    (2026-03-10) Abedini, Alireza; Khazaei, Hamzeh
    Serverless computing offers elasticity and fine-grained billing for machine learning inference, but efficiently supporting large models under diverse latency service-level objectives (SLOs) remains challenging. In particular, existing approaches face a wide cost–performance gap between CPU and GPU execution, while batching and resource selection become increasingly complex under heterogeneous workloads and multiple SLOs. This thesis presents PAMBA, a partition-aware and multi-SLO batching system for serverless inference on heterogeneous clouds. PAMBA combines multi-SLO batching with analytical latency and cost models for CPU, GPU, and partitioned execution, enabling consistent provisioning decisions across different execution modes. To bridge the CPU–GPU gap, the system employs a customized partitioning strategy derived from latency-optimal partitioning, adapted to satisfy serverless resource constraints and jointly consider latency feasibility and per-request cost. This adaptation allows partitioned execution to emerge as an effective intermediate regime between monolithic CPU and GPU deployments. By jointly optimizing execution mode selection, batching, and resource allocation, PAMBA enables flexible inference deployment across a wide range of SLOs and arrival rates, including scenarios where GPU resources are unavailable or inefficiently utilized. Experimental results on convolutional neural networks demonstrate that PAMBA identifies distinct execution frontiers and reduces inference cost compared to existing serverless batching techniques, while maintaining SLO feasibility across heterogeneous workloads.
  • Item type: Item , Access status: Open Access ,
    OptiServe: Cost-Aware, Performance-Driven, and Accuracy-Tuned Serverless Applications with ML Workloads
    (2026-03-10) Boukani, Arian; Khazaei, Hamzeh
    Serverless computing has emerged as a popular cloud paradigm due to its seamless scalability and cost-efficient, pay-as-you-go pricing model. Its potential to support machine learning (ML) inference workloads,including generative AI tasks, has led to growing adoption of ML functions within serverless applications. A key challenge, however, is selecting suitable ML models that balance execution time, deployment cost, and inference accuracy in latency- and cost-sensitive environments. In this study, we present a framework for optimizing serverless applications that incorporate ML components through tri-objective optimization. We develop high-fidelity analytical models, augmented with lightweight profiling, to capture the trade-offs among cost, performance, and accuracy across different model choices. These models serve as the foundation for guiding ML model selection and deployment strategies to meet application-specific service-level objectives. We validate our framework through real-world experiments on AWS using real serverless applications. Furthermore, we demonstrate its practicality by performing extensive what-if analyses, exploring a wide range of application scenarios and configurations, in under a minute. Our extensive experiments on real-world applications show that OptiServe recommends memory and ML model configurations that achieve over 95% of the accuracy of ideal configurations in 89.64% of cases, enabling efficient, low-cost deployments while maintaining model accuracy and meeting performance targets.
  • Item type: Item , Access status: Open Access ,
    Foundation Models for Analyzing Single-Cell RNA Sequence data
    (2026-03-10) Naziri, Amirreza; Seyyed-Kalantari, Laleh
    Single-cell RNA sequencing (scRNA-seq) measures gene expression in individual cells, offering deep insight into cellular heterogeneity, development, and disease. Transformer-based foundation models have become central to single-cell RNA-sequencing analysis, yet most rely on uniform random masking during pretraining, a strategy misaligned with the sparsity, heterogeneity, and zero inflation characteristic of scRNA-seq data. To assess how these models behave under realistic biological variation, we first perform a comprehensive evaluation of four widely used single-cell foundation models (Geneformer, scBERT, scFoundation, and scGPT) across three diverse datasets. This benchmarking reveals substantial variability in model performance, including systematic weaknesses on rare cell populations and degraded accuracy in clinically challenging conditions. Motivated by the broader limitations of random masking in Foundation models, we introduce Multinomial Attention Masking (MAM), a biologically informed masking strategy that leverages trainable latent representations and cross-attention to identify informative gene positions during pretraining. Across all datasets, models pretrained with MAM consistently achieve higher downstream cell-type classification accuracy than those trained with uniform masking and, in several cases, outperform the original pretrained backbones. Biological validation further demonstrates that MAM preferentially selects highly expressed and functionally meaningful genes, indicating that its improvements stem from capturing biologically relevant structure rather than from increased algorithmic complexity. This work improves the reliability and utility of single-cell foundation models for researchers and clinicians alike.
  • Item type: Item , Access status: Open Access ,
    Towards Agentic Vision Language Models for Question Answering on Interactive Dashboard
    (2026-03-10) Kartha, Aaryaman Sudhir; Prince, Enamul Hoque
    Multimodal models, specifically Vision Language Models (VLMs), have shown increasing capabilities in data visualization oriented downstream tasks, achieving performance saturation in shorter intervals of time. Consequently, focus has shifted to assessing their potential towards new frontiers, specifically interactive environments. Various benchmarks center around data visualization question answering tasks on static visualizations, and such rudimentary approaches don’t reflect real world analysis scenarios where vast decision making is required. Dashboards, while being commonplace tools in various industries, have had limited work done into evaluating the capabilities of VLMs to traverse and reason with them. To tackle these limitations, this thesis presents DashboardQA, a novel benchmark for interactive dashboard question answering. Overall, 292 tasks encompassing 405 QA pairs are presented from 5 diverse category types, with 112 carefully chosen dashboards represented. Experimental results show this benchmark is a challenge for various types of VLMs assessed, with the best model achieving 38.69 %.
  • Item type: Item , Access status: Open Access ,
    Designing an Interactive Tool for Mnemonics Creation and Knowledge Retention
    (2026-03-10) Ejaz, Sarah; Oyibo, Kiemute
    Research shows that mnemonics are an effective learning technique, yet few tools support mnemonics-based long-term learning. We designed and evaluated a mnemonics-creation tool, the SAVE Tool, to promote active learning and retrieval practice. Forty-five participants were assigned to experimental and control groups and viewed a 10-minute biology lecture covering six topics. They completed recall and recognition tasks after a 45-minute practice session (T1), one week later without revision (T2), and after a 15-minute revision (T3). Results showed that the SAVE Tool group consistently outperformed the control group in recall across all time points, with statistically significant differences at T3 and for more difficult topics such as Krebs Cycle Substrates and Cranial Nerves. No significant group differences were found for recognition. These findings suggest mnemonics-based tools can enhance long-term learning without hindering understanding and should be integrated into memory-intensive courses.
  • Item type: Item , Access status: Open Access ,
    Designing Mnemonics Serious Games to Promote Knowledge Retention in Memory-Intensive Courses
    (2026-03-10) Fung, Kingson; Oyibo, Kiemute
    The increased difficulty of memory-intensive courses due to many factors necessitates using technological tools to promote retrieval practice and long-term learning. Hence, a mnemonics game, based on the RADAR framework proposed by Oyibo, was implemented to foster knowledge retention. Fifty-two students, comprising an experimental group (n = 30) and a control group (n = 22), were recruited to undertake a study, which involved watching a 10-minute Biology lecture on Biology Organization, Cranial Nerves, and Krebs Cycle and taking repeated tests immediately after a 45-minute preparation, one-week gap pre- and post-15-minute revision. In all three tests, students who used the RADAR game performed better in recall than those who did not. Coupled with the study participants stating they found the game easy to use, enjoyable, useful, and trustworthy, and their willingness to adopt it, the experimental group’s better performance highlights the need to incorporate mnemonics-based games in memory-intensive courses to promote long-term learning.
  • Item type: Item , Access status: Open Access ,
    A Concurrent List with Adaptive Bounds
    (2026-03-10) Asbell, Shalom Moshe; Ruppert, Eric
    Few concurrent data structures adapt dynamically to access patterns, and those that do lack formal performance guarantees. This thesis introduces the first self-adjusting concurrent data structure with an adaptive bound analogous to those of optimal self-adjusting sequential structures. We present a lock-free move-to-front (MTF) list supporting a dynamic set of keys, where an operation op on a key k runs in an amortized number of steps proportional to the size of its working plus a contention term. The working set of op is the set of keys accessed since the last operation on key k, and contention is the number of operations that run concurrently with op. We further show that the list performs at most twice as much work as the best possible fixed list for a set of searches, up to contention. Finally, we prove that the number of nodes reachable from shared memory is bounded by the number of keys in the set plus contention.
  • Item type: Item , Access status: Open Access ,
    "Towards Closed-Loop Sleep Monitoring in Parkinson’s Disease: Self-Supervised Learning Strategies for Sleep Stage Classification"
    (2026-03-10) Menguc, Kristal Doga; Zylberberg, Joel
    Parkinson’s disease (PD) involves severe sleep disturbances that may accelerate neurodegeneration. Closed-loop deep brain stimulation (DBS) is a promising therapeutic solution but requires accurate, real-time sleep-stage classification from subthalamic nucleus signals, a task where conventional models generalize poorly. To address pronounced class imbalance and improve cross-patient generalization, this work introduces a self-supervised transformer framework. The model employs a masked autoencoder strategy, pretrained on large public EEG/ECoG datasets from healthy subjects to learn balanced representations of all sleep stages, thereby improving discrimination of rare classes. While pretraining yielded limited overall improvement, it specifically enhanced N3 stage identification and next-sleep-stage prediction. Contrastive self-supervision (70%) significantly outperformed a reconstruction-based approach (62%). Furthermore, spectral feature extraction proved more effective than temporal CNN features for distinguishing commonly mispredicted stages. Future work will focus on hybrid reconstruction-contrastive losses, incorporating spectral feature usage to forecasting and extending the forecasting horizon to predict multiple subsequent stages for proactive neuromodulation.
  • Item type: Item , Access status: Open Access ,
    A CNN–LSTM–Attention Hybrid Architecture for Real-Time Intrusion Detection at the Data Link Layer
    (2026-03-10) Ahmadnejad Roudsari, Amirhossein; Habibi Lashkari, Arash
    Data Link Layer (Layer 2) security remains one of the most underexplored areas in modern network intrusion detection research, despite its critical role as the foundation of reliable communication between networked devices. Attacks at this layer, such as ARP spoofing, MAC flooding, VLAN hopping, and DHCP starvation, can compromise entire networks before higher-layer defenses activate. Existing intrusion detection systems predominantly focus on network or transport layers, leaving a significant gap in early-stage threat prevention. To address this limitation, this thesis proposes a memory-efficient hybrid deep learning architecture that integrates Convolutional Neural Networks (CNNs), Long Short-Term Memory (LSTM) units, and an Attention mechanism for real-time detection of Layer 2 intrusions. A novel dataset, BCCC-DLLayer-IDS-2025, was developed as part of this research, comprising over 4.6 million labeled flow records collected in a controlled experimental environment. The dataset includes eleven distinct attack types spanning spoofing, flooding, and protocol manipulation scenarios, along with benign traffic, providing a comprehensive foundation for training and benchmarking Layer 2 intrusion detection systems. The proposed CNN–LSTM–Attention architecture combines spatial and temporal feature extraction with an adaptive focus mechanism, enabling effective modeling of short-term dependencies in network traffic while reducing redundancy. The model achieves an F1-score of 99.67\% with only 2.1 million parameters and a latency below 100 milliseconds, offering a 60\% lower computational cost than conventional deep learning models. Extensive experiments under varying traffic conditions and noise levels confirm the model’s robustness, generalizability, and suitability for real-time deployment on resource-constrained edge and IoT devices.
  • Item type: Item , Access status: Open Access ,
    Monocular Camera-based Road Segmentation Using Geometric Cues
    (2026-03-10) Cheng, Gong; Elder, James
    Vision-based road segmentation aims to identify drivable road regions in images captured by vehicle-mounted cameras. With the rapid development of autonomous driving, such methods have attracted substantial attention from both academia and industry due to their research significance and commercial potential. However, a key challenge lies in achieving robust adaptation—that is, effectively generalizing to diverse environmental and operational conditions. In my Ph.D. research, I have concentrated on leveraging geometric information to advance monocular-camera-based road segmentation and adaptation. Specifically, I have completed three independent projects: (1) a supervised road segmentation approach that fuses geometric cues with appearance cues to enhance segmentation accuracy, (2) an unsupervised domain adaptation method that employs a novel geometry-guided strategy to reduce domain shift between source and target data, and (3) another unsupervised domain adaptation approach that uses known camera parameters and online estimation of the camera pan and tilt to better align imagery across and within datasets, improving accuracy and generalization. These contributions collectively push forward the development of robust, geometry-driven solutions for road scene understanding.
  • Item type: Item , Access status: Open Access ,
    EAGLE-APT: Edge-Aware Provenance Graph Learning with Node Encoding for Advanced Persistent Threat Detection and Attribution from System Audit Log
    (2026-03-10) Abbaszadeh Darban, Reza; Habibi Lashkari, Arash
    Advanced Persistent Threats (APTs) represent some of the most challenging forms of cyberattacks, characterized by stealth, persistence, and multi-stage operations that evade traditional defenses. Detecting and attributing such campaigns to a known APT group requires methods that can capture long-term coordinated malicious activity within complex system interactions. This research introduces EAGLE-APT, an Edge-Aware Provenance Graph Learning framework with Node Encoding for APT detection and attribution from system audit logs. The proposed architecture comprises five core components: a provenance graph generator, a node feature extractor, a type-specific feature encoder, a malicious node detector, and an attribution module. The process begins with the provenance graph generator, which converts raw audit logs into heterogeneous provenance graphs that capture system entities and their causal relationships. These graphs are then enriched by the node feature extractor, which incorporates both semantic and structural information to represent the behavior of each entity more effectively. Next, the type-specific feature encoder transforms heterogeneous node features into a unified embedding space, ensuring that diverse data types contribute meaningfully to the representation. Building on this foundation, the malicious node detector utilizes an edge-aware graph neural network to identify suspicious nodes, taking into account both the contextual importance of neighbors and the nature of their connections. Finally, the attribution module analyzes the detected malicious subgraphs and classifies them into known APT groups, offering a foundation for informed response and defense strategies. To support evaluation, a comprehensive dataset of simulated APT campaigns was generated in a controlled enterprise environment, capturing realistic multi-stage attack behaviors. Together, these contributions provide both a novel framework for end-to-end detection and attribution and a reproducible dataset that can serve as a basis for advancing future research in APT defense.
  • Item type: Item , Access status: Open Access ,
    Geometry-Aware Diffusion Models for Multiview Scene Inpainting
    (2026-03-10) Salimi, Ahmad; Derpanis, Konstantinos
    In this thesis, we focus on 3D scene inpainting, where parts of an input image set, captured from different viewpoints, are masked out. The main challenge lies in generating plausible image completions that are geometrically consistent across views. Most recent work addresses this challenge by combining generative models with a 3D radiance field to fuse information across a relatively dense set of viewpoints. However, a major drawback of these methods is that they often produce blurry images due to the fusion of inconsistent cross-view images. To avoid blurry inpaintings, we eschew the use of an explicit or implicit radiance field altogether and instead fuse cross-view information in a learned space. In particular, we introduce a geometry-aware conditional generative model, capable of multiview consistent inpainting using reference-based geometric and appearance cues. A key advantage of our approach over existing methods is its unique ability to inpaint masked scenes with a limited number of views (i.e., few-view inpainting), whereas previous methods require relatively large image sets for their 3D model fitting step. Empirically, we evaluate and compare our scene-centric inpainting method on two datasets, SPIn-NeRF and NeRFiller, which contain images captured at narrow and wide baselines, respectively, and achieve state-of-the-art 3D inpainting performance on both. Additionally, we demonstrate the efficacy of our approach in the few-view setting compared to prior methods.
  • Item type: Item , Access status: Open Access ,
    Modern Deep Learning Methods for Time Series Analysis
    (2026-03-10) Mootoo, Xavier Stephen; Tabassum, Hina
    Deep learning methods for time series analysis have become prominent tools in state-of-the-art predictive modeling, with applications spanning finance, transportation, energy, healthcare, climate science, and numerous other domains. In this thesis, we explore modern deep learning techniques that address two key challenges common across many domains: handling variable-structure inputs and quantifying prediction uncertainty, with the goal of building models that are both robust and adaptable. We demonstrate these advances across two distinct domains: electromagnetic field (EMF) exposure time series forecasting in modern wireless networks and variable-length time series classification (VTSC) in healthcare. The first contribution introduces EMForecaster, a novel deep learning framework designed for accurate EMF exposure prediction. The architecture employs hierarchical patching to capture temporal patterns at multiple scales, complemented by reversible instance normalization and mixing operations for efficient feature extraction. To enhance reliability, EMForecaster incorporates conformal prediction mechanisms that provide principled uncertainty quantification, enabling trustworthy forecasts with guaranteed coverage rates. A new Trade-off Score metric is developed to balance prediction reliability against interval width. Empirical evaluations demonstrate EMForecaster's superior performance across diverse EMF datasets, with improvements of up to 53.97\% over transformer architectures for point forecasts, while maintaining optimal balance between prediction interval coverage and width. The second contribution presents a Stochastic Sparse Sampling (SSS) framework for variable-length time series classification, a prevalent challenge in healthcare applications. SSS addresses the inherent variability in clinical time series by intelligently sampling fixed windows to compute local predictions, which are then aggregated and calibrated to form robust global classifications. This approach is validated on the task of seizure onset zone (SOZ) localization using intracranial electroencephalography (iEEG) recordings from the Epilepsy iEEG Multicenter Dataset. SSS demonstrates superior performance compared to state-of-the-art methods across most medical centers, particularly excelling in out-of-distribution scenarios with previously unseen data sources. Additionally, SSS provides valuable post-hoc insights by visualizing temporally averaged local predictions throughout the signal. Together, these methodologies advance the state-of-the-art in time series analysis through innovative deep learning techniques, uncertainty quantification, and interpretability methods, offering significant improvements for both forecasting and classification tasks in real-world wireless network and healthcare applications.
  • Item type: Item , Access status: Open Access ,
    Deep Learning Models for Detecting Online Harmful Content
    (2025-11-11) Wei, Feng; Nguyen, Uyen T.
    Deep learning (DL) has emerged as a transformative technology with substantial impact across various domains, including cybersecurity. This dissertation leverages deep learning methods and their applications to address increasingly sophisticated cyber threats. DL methods are capable of learning complex and abstract features from large-scale data, making them well-suited for identifying and mitigating cyber threats that traditional methods might miss. This dissertation focuses on the practical implementation and evaluation of DL models for detecting real-world cybersecurity threats, namely, clickbait, Twitter bots and SMS spam. Specifically, we propose: - a novel attention-based neural network model named Knowledge-Enhanced Clickbait Detector (KED) that uses linguistic knowledge graphs built from WordNet to guide the attention mechanisms. The proposed neural network can effectively capture discriminative features from local and global similarities via the proposed knowledge-enhanced attention mechanisms. Moreover, we incorporate human semantic knowledge into the neural network and its attention mechanisms to better capture semantic correlations of headline-article word pairs. - a novel recurrent neural network (RNN) model to distinguish Twitter bots from human accounts based on textual content of their tweets. We use several types of linguistic embeddings to encode tweets, namely, word embeddings, character embeddings, part-of-speech embeddings, and named-entity embeddings. We avoid using handcrafted features, which require time-consuming and labor-intensive feature engineering. This advantage allows for faster and easier implementation and deployment of the bot detection scheme. - a novel lightweight deep neural model called Lightweight Gated Recurrent Unit (LGRU) for SMS spam detection. We incorporate enhancing semantics retrieved from external knowledge to assist in understanding SMS text inputs for more accurate detection. In addition, the lightweight model illustrates a method to minimize unnecessary complexity in training recurrent models without compromising the performance, which we believe is applicable to many other complex recurrent models for other applications. Experimental results show that the above models outperform their counterparts, including state-of-the-art models/systems and other baseline models, in terms of predictive performance and/or running time. The proposed models provide robust, scalable, and real-time security solutions that can adapt to the rapidly changing landscape of cyber threats.
  • Item type: Item , Access status: Open Access ,
    Simulation-Based Evaluation of Transaction Finality in Bitcoin Using CNSIM
    (2025-11-11) Radjou, Amirreza; Liaskos, Sotirios
    Blockchain consensus protocols must be thoroughly evaluated for security and resilience, but their large scale makes experimental testing in a lab setting challenging. While numerous simulators exist, there is a need for a more general framework that can translate simulation data into useful and comparable metrics. This thesis addresses this gap by adopting CNSim, a simulator developed at York University that introduces a finality-based approach to evaluating consensus networks. To study the Bitcoin protocol, CNSim was enhanced by designing and implementing a novel framework for modeling adversarial behaviors. Specifically, the Majority Attack was implemented to create a detailed simulation for double-spending scenarios. Using this extended simulator, a systematic evaluation was conducted to assess the attack's impact on transaction finality, quantifying how network resilience degrades as malicious hash power increases. The findings provide valuable insights into the practical security limitations of the Bitcoin protocol and successfully demonstrate the utility of a finality-based methodology for analyzing blockchain consensus mechanisms.
  • Item type: Item , Access status: Open Access ,
    Evaluating and Enhancing LLMs for Deep Learning Code Generation with DL-Bench
    (2025-11-11) DaghighFarsoodeh, Alireza; Pham, Hung Viet
    Large Language Models (LLMs) have recently demonstrated remarkable capabilities in automated code generation, yet their performance on domain-specific tasks such as deep learning (DL) pipelines remains underexplored. This thesis addresses this gap by introducing DL-Bench, the first comprehensive benchmark dedicated to evaluating LLMs on DL-specific code generation. DL-Bench comprises 520 carefully curated function-level tasks spanning all stages of the machine learning workflow, including data pre- and post-processing, model construction, training, inference, and evaluation, and is systematically categorized by pipeline stage, task type, and input modality. This fine-grained design enables detailed performance analysis and exposes unique challenges of DL code generation, such as tensor shape mismatches, framework-specific errors, and brittle reliance on phrasing. Building on this benchmark, we further investigate robustness strategies for LLMs by proposing a prompt mutation pipeline combined with dual execution agreement. The pipeline systematically generates semantically equivalent prompt variations through lexical, grammatical, and naming transformations, which are then paired with model-generated test cases to diversify candidate solutions. Using a dual agreement framework, correct solutions are identified by their consistent success across test suites, mitigating common misinterpretations. To validate this approach, we evaluate three state-of-the-art LLMs, O4-Mini, DeepSeek R1 Basic, and Gemini 2.5 Pro, exclusively on DL-Bench. Results show that while baseline performance on DL-Bench is substantially lower than on general-purpose benchmarks, prompt mutations consistently yield measurable improvements (up to +2.9% pass@1), demonstrating their value in uncovering alternative correct solutions. Overall, this thesis makes three key contributions: (i) the release of DL-Bench as a domain-specific, fine-grained benchmark for DL code generation, (ii) a systematic analysis of LLM weaknesses in DL contexts supported by a taxonomy of mutation effects, and (iii) the design and evaluation of a mutation-based dual agreement framework that enhances LLM reliability. These contributions provide both practical evaluation tools and methodological insights for advancing LLMs in specialized scientific programming domains. Future directions include scaling DL-Bench with multi-modal tasks, maintaining it as a live benchmark to track recency effects, and incorporating broader metrics such as code efficiency and maintainability.
  • Item type: Item , Access status: Open Access ,
    Multimodal Representation Learning in Medical Using Vision-Language Models
    (2025-11-11) Baghbanzadeh, Negin; Dolatabadi, Elham
    Recent advances in multimodal models such as LLaVA and InstructBLIP highlight the importance of high-quality image encoders, particularly in the biomedical domain where figures and captions are complex. Existing medical vision–language datasets primarily emphasize scale, often overlooking data quality. In this thesis, we introduce OPEN-PMC, a carefully curated collection of biomedical image–text pairs derived from PubMed Central. OPEN-PMC incorporates multiple refinement steps, including compound figure decomposition with a modality-robust detector, subcaption segmentation, in-text reference extraction and summarization, and modality-aware classification. Using OPEN-PMC-2M, we conduct controlled experiments to quantify the effect of each processing step on contrastive pretraining. Our findings show that subfigure decomposition and enriched captions substantially improve retrieval, zero-shot classification, and robustness, outperforming larger but noisier datasets. Scaling to OPEN-PMC-18M, one of the largest curated biomedical VL datasets to date, we demonstrate state-of-the-art performance while discussing remaining limitations in large-scale contextual augmentation and clinical validation.
  • Item type: Item , Access status: Open Access ,
    AdapTrain: Adaptive Model Partitioning for Efficient Independent Subnet Training on Heterogeneous and Dynamic Cloud Infrastructures
    (2025-11-11) Naderi, Mohammadhossein; Khazaei, Hamzeh
    Modern distributed training systems face significant challenges in heterogeneous computing environments, where heterogeneity in computational resources among workers often leads to resource underutilization and extended training durations, particularly in resource-constrained environments. To address these challenges, we propose Adaptive Model Partitioning for Efficient Independent Subnet Training on Heterogeneous and Dynamic Cloud Infrastructures (AdapTrain), a novel framework that dynamically adjusts model partitioning to align with the computational capacities of heterogeneous workers. AdapTrain reduces the overhead of synchronization, thereby minimizing total end-to-end training time by ensuring synchronized completion of training rounds across all workers. Its adaptive design enables robust performance under workload variations, inherent resource heterogeneity, and multi-tenancy effects prevalent in cloud computing environments. An experimental evaluation of production workloads reveals that AdapTrain accelerates model convergence by more than 8x compared to the current training methods. Furthermore, AdapTrain integrates seamlessly into existing systems, introducing negligible system performance overhead while significantly enhancing training efficiency.
  • Item type: Item , Access status: Open Access ,
    On the Complexity of Telephone Broadcasting: From Cacti to Bounded Pathwidth Graphs
    (2025-11-11) Seyed Javadi, Seyed Mohammad; Kamali, Shahin
    In Telephone Broadcasting, the goal is to disseminate a message from a given source vertex of an input graph to all other vertices in the minimum number of rounds, where at each round, an informed vertex can send the message to at most one of its uninformed neighbors. We study the problem in cactus graphs and graphs of bounded pathwidth. Despite many previous efforts, the complexity of the problem in cactus graphs remained open. We settle this question by establishing the NP-completeness of broadcasting in cactus graphs and graphs of pathwdith 2. On the positive side, we present constant-factor approximation algorithms for the studied families of graphs, namely, an algorithm with an approximation factor of 2 for cactus graphs and an approximation factor of O(1) for graphs of constant pathwidth.