Computer Science

Permanent URI for this collection

https://hdl.handle.net/10315/38508

Browse

Now showing 1 - 20 of 73

Open Access
Examining the Effectiveness of Generative Artificial Intelligence for the Identification of Defeaters in Assurance Cases
(2024-07-18) Khakzad Shahandashti, Kimya; Boaye Belle, Alvine
Assurance cases are structured arguments that allow verifying the correct implementation of the created systems’ non-functional requirements (e.g., safety, security, reliability). This allows for preventing system failure. The latter may result in loss of life, severe injuries, large-scale environmental damage, property destruction, and major economic loss. Assurance cases support the certification of systems in compliance with industrial standards (e.g., DO-178C, ISO26262). However, the presence of assurance weakeners - deficits and logical fallacies - signals gaps in evidence and reasoning. Addressing this, our research presents a comprehensive taxonomy for categorizing these assurance weakeners, alongside proposed management strategies. The taxonomy divides weakeners into four categories of uncertainty: aleatory, epistemic, ontological, and argumentation. It also categorizes management approaches into representation, identification, and mitigation. A critical aspect of strengthening assurance cases involves identifying argumentation uncertainty or defeaters. To automate this process, we explore the capabilities of GPT-4 Turbo, a sophisticated large language model by OpenAI. We focus on its application in detecting defeaters within assurance cases represented using Eliminative Argumentation(EA) notation. Our initial evaluation assesses GPT-4 Turbo’s proficiency in understanding and applying this notation, a key factor in effectively generating defeaters. The results indicate that GPT-4 Turbo is highly adept in EA notation, demonstrating its potential to generate a diverse range of defeaters, thereby enhancing the robustness and reliability of assurance cases. Moreover, we used GPT-4 Turbo to identify defeaters which demonstrated effective proficiency.
Open Access
Secure Abstraction of Fractionalization Smart Contracts for Non-Fungible Tokens
(2024-07-18) Haouari, Wejdene; Fokaefs, Marios
Non-fungible tokens (NFTs) have faced a downturn in interest, prompting a critical reassessment of their utility and accessibility. Fractionalization emerges as a solution, by enabling multiple parties to hold a stake in a single NFT, fractionalization lowers the barrier to entry for investors, enhancing market liquidity. Implementing fractionalization relies on smart contracts, which govern the terms of division, and transfer of fractions of an NFT. However, the complexity of these contracts and the immutable nature of blockchain underscores the importance of security. This thesis tackles the challenge of implementing fractionalization solutions and enhancing the security of supporting smart contracts. It thoroughly analyzes current fractionalization methods, identifies security vulnerabilities, and explores mitigation strategies to contribute to a safer and inclusive NFT ecosystem. The goal is to propose a baseline for standardizing NFT fractionalization to improve interoperability and address security concerns, laying the groundwork for a more unified and secure ecosystem.
Open Access
Design and Automatic Generation of Safety Cases of ML-Enabled Autonomous Driving Systems
(2024-07-18) Sivakumar, Mithila; Belle, Alvine Boaye
Safety cases play a pivotal role in ensuring system reliability and acceptability, providing a structured argument supported by evidence. However, gaps in safety case literature hinder comprehensive safety assurance practices. In this thesis, we address this challenge through a three-fold approach. First, we conducted a bibliometric analysis following PRISMA 2020 guidelines to identify trends and knowledge gaps in safety assurance research. The analysis reveals critical areas lacking full safety cases and highlights the need for automated safety case construction. Then, we manually constructed a safety case for an ML-enabled component of an autonomous vehicle. Finally, leveraging large language models like GPT-4, we conducted experiments to automate safety case generation. Results indicate that GPT-4 produces safety cases with moderate accuracy and high semantic similarity to ground truth cases. This comprehensive methodology enhances safety practices, aiding researchers, analysts, and regulators in achieving robust safety assurance in complex systems.
Open Access
Enhancing code review for improved code quality with language model-driven approaches
(2024-03-16) Rahman, Shadikur; Prince, Enamul Hoque
Code review is essential for maintaining software development standards, yet achieving effective reviews and issue resolution remains challenging. This thesis introduces RefineCode, an application tool to find actionable code reviews and provide similar code reviews as references within an organization, aiding developers in resolving issues effectively. To this end, we collected 9,500 code reviews from five private projects in an industrial setting and empirically evaluated various classification methods for identifying actionable code reviews. RefineCode automatically recommends relevant solutions from Stack Overflow based on textual similarity and entity linking between code reviews and Stack Overflow issues. Additionally, it integrates a chatbot feature, leveraging large language models to propose potential solutions for actionable code reviews. These features empower developers to make informed decisions, enhancing code quality by guiding issue resolution without reinforcing misunderstandings.
Open Access
Towards Efficient and Robust Caching: Investigating Alternative Machine Learning Approaches for Edge Caching
(2024-03-16) Torabi, Hoda ; Litoiu, Marin; Khazaei, Hamzeh
This study introduces HR-Cache, a caching framework designed to enhance the efficiency of edge caching. The increasing complexity and variability of traffic classes at edge environments pose significant challenges for traditional caching methods, which often rely on simplistic metrics. HR-Cache addresses these challenges by implementing a learning-based strategy grounded in Hazard Rate ordering, a concept originally used to establish cache performance upper bounds. By employing a lightweight supervised machine learning model, HR-Cache learns from HR-based caching decisions and predicts the "cache-friendliness" of incoming requests, identifying "cache-averse" objects as priority candidates for eviction. Our experiment results demonstrate HR-Cache's superior performance. It consistently achieves 2.2–14.6% greater WAN traffic savings compared to the LRU strategy and outperforms both heuristic and state-of-the-art learning-based algorithms, while adding minimal prediction overhead. Though designed with the considerations of edge caching limitations, HR-Cache can be adapted with minimal changes for broader applicability in various caching contexts.
Open Access
Trajectory Prediction Learning using Deep Generative Models
(2024-03-16) Li, Jing; Papagelis, Manos
Trajectory prediction involves estimating an object's future path using its current state and historical data, with applications in autonomous vehicles, robotics, and human motion analysis. Deep learning methods trained on historical data have been applied to this task, but they struggle with complex spatial dependencies due to the intricate nature of trajectory data and dynamic environments. We introduce TrajLearn, a novel trajectory prediction model using generative models and higher-order mobility flow representations (hexagons). TrajLearn, given a trajectory's recent history and current state, predicts its next k steps. It employs a variant of beam search for exploring multiple paths, ensuring spatial continuity. Our experiments demonstrate that TrajLearn surpasses current leading methods and other baselines by about 60% on various real-world datasets. We also explore different prediction horizons (k values), perform resolution sensitivity analysis, and conduct an ablation study to evaluate the contributions of different model components.
Open Access
Multi-Versioning and Microservices: A Strategy for Developing Reliable Software Systems
(2024-03-16) Akhtarian, Nazanin; Khazaei, Hamzeh
In the dynamic realm of software engineering, adaptability is key to sustaining system performance and reliability. Software iterations often bring about challenges such as unexpected bugs and performance issues, necessitating a nuanced approach to maintain system integrity. In this work, we propose employing software multi-versioning to enhance system reliability. We embark on an in-depth exploration of the reliability of microservices within chaotic environments. Using Chaos Mesh, we simulate a series of disruptions in a microservices-based application, i.e., the Online Boutique. Through real experimentation, we systematically introduce various chaos disruptions, such as Pod failures, response delay, and memory stress, to investigate their impact on the system's reliability. We define a reliability metric that quantifies the robustness and efficiency of each software version under adverse conditions. Leveraging this metric, we introduce a dynamic controller that adjusts the population of each version, ensuring optimal resource distribution, reliability and system performance. Additionally, our research evaluates how the system adapts to varying workloads. We investigate how well the system can adjust its scalability—specifically, the number of replicas—in response to changes in \acrshort{cpu} usage as the user load fluctuates. Our findings demonstrates the system's capability to scale dynamically based on workload demands, ensuring robustness and efficiency. In conclusion, our study provides a detailed framework for employing software multi-versioning as a means to enhance system reliability. By devising a reliability metric and implementing a dynamic scaling system that responds to both reliability assessments and workload variations, we offer a comprehensive strategy to fortify systems against the unpredictable nature of software evolution, ensuring they remain resilient and make efficient use of resources.
Open Access
Key-Frame Based Motion Representations for Pose Sequences
(2024-03-16) Thasarathan, Harrish Patrick; Derpanis, Konstantinos
Modelling human motion is critical for computer vision tasks that aim to perceive human behaviour. Extending current learning-based approaches to successfully model long-term motions remains a challenge. Recent works rely on autoregressive methods, in which motions are modelled sequentially. These methods tend to accumulate errors, and when applied to typical motion modelling tasks, are limited up to only four seconds. We present a non-autoregressive framework to represent motion sequences as a set of learned key-frames without explicit supervision. We explore continuous and discrete generative frameworks for this task and design a key-framing transformer architecture to distill a motion sequence into key-frames and their relative placements in time. We validate our learned key-frame placement approach with a naive uniform placement strategy and further compare key-frame distillation using our transformer architecture with an alternative common sequence modelling approach. We demonstrate the effectiveness of our method by reconstructing motions up to 12 seconds.
Open Access
Data Acquisition for Domain Adaptation of Closed-Box Models
(2024-03-16) Liu, Yiwei; Yu, Xiaohui
Machine learning (ML) marketplace provides customers with various ML solutions to accelerate their business. Models in the ML market are often available as closed boxes, but they may suffer from distribution shifts in new domains. Prior techniques cannot address this problem, because they are either impractical to use or against the property of closed-box models. Instead, we propose to acquire extra data to construct a "padding" model to help the original closed box with its classification weaknesses in the target domain. Our solution consists of a "weakness detector" to discover the deficiency of the original closed-box model and the Augmented Ensemble approach to combine the source and the padding model for better performance in the target domain and further diversifying the ML marketplace. Extensive experiments on several popular benchmark datasets confirm the superiority and robustness of our proposed framework over baseline approaches.
Open Access
Examining Autoexposure for Challenging Scenes
(2024-03-16) Yang, Beixuan; Brown, Michael S.
Autoexposure (AE) is a critical step cameras apply to ensure properly exposed images. While current AE algorithms are effective in well-lit environments with unchanging illumination, these algorithms still struggle in environments with bright light sources or scenes with abrupt changes in lighting. A significant hurdle in developing new AE algorithms for challenging environments, especially those with time-varying lighting, is the lack of platforms to evaluate AE algorithms and suitable image datasets. To address this issue, we have designed a software platform allowing AE algorithms to be used in a plug-and-play manner with the dataset. In addition, we have captured a new 4D exposure dataset that provides a complete solution space (i.e., all possible exposures) over a temporal sequence with moving objects, bright lights, and varying lighting. Our dataset and associate platform enable repeatable evaluation of different AE algorithms and provide a much-needed starting point to develop better AE methods.
Open Access
Advancing Blind Face Restoration: Robustness and Identity Preservation with Integrated GAN and Codebook Prior Architectures
(2024-03-16) Tayarani Bathaie, Seyed Nima ; An, Aijun
Blind Face Restoration (BFR) is a challenging task in computer vision, which aims to reconstruct High-Quality (HQ) facial images from Low-Quality (LQ) inputs. BFR presents as a challenging ill-posed problem, necessitating auxiliary information to constrain the solution space. While geometric and generative facial priors provide some support in BFR, their effectiveness wanes under intense degradation. Discrete codebook priors, though promising, grapple with the difficulty of associating intensely degraded images with their corresponding codes. To effectively address these limitations, this research introduces a two-stage restoration approach, termed Identity-embedded GAN and Codebook Priors (IGCP), which synergistically combines the strengths of both generative and codebook priors. In the first stage, our approach employs a Generative Prior Restorer (GPR) network for initial image restoration. Distinct from existing methods that apply identity-based losses to the final restored image, our work innovates by embedding identity information directly into the style vectors of the StyleGAN2 network during the generation process. This is achieved through the introduction of an \emph{identity-in-style} loss, ensuring superior fidelity and identity preservation even in severely degraded images Proceeding to the second stage, the approach utilizes a two-component framework known as the Codebook Prior Restorer (CPR) network. This framework comprises a Vector Quantized AutoEncoder (VQAE) for artifact mitigation and to add a final touch of quality, complemented by introducing a Feature Transfer Module (FTM) that is demonstrated to be necessary to ensure fidelity and identity preservation. Extensive experimental evaluations were conducted across five datasets, including our newly introduced CelebA-IntenseTest dataset. The results from these experiments demonstrate the remarkable efficacy of the IGCP approach. Notably, IGCP has shown exceptional performance in handling various degradation levels, setting new benchmarks in the domain of BFR.
Open Access
Active Visual Search: Investigating human strategies and how they compare to computational models
(2024-03-16) Wu, Tiffany; Tsotsos, John K.
Real world visual search by fully active observers has not been sufficiently investigated. Whilst the visual search paradigm has been widely used, most studies use a 2D, passive observation task, where immobile subjects search through stimuli on a screen. Computational models have similarly been compared to human performance only to the degree of 2D image search. I conduct an active search experiment in a 3D environment, measuring eye and head movements of untethered subjects during search. Results show patterns forming strategies for search, such as repeated search paths within and across subjects. Learning trends were found, but only in target present trials. Foraging models encapsulate subject location-leaving actions, whilst robotics models captured viewpoint selection behaviours. Eye movement models were less applicable to 3D search. The richness of data collected from this experiment opens many avenues of exploration, and the possibility of modelling active visual search in a more human-informed manner.
Open Access
Precision Recall Cover: A Method to Assess Generative Models
(2023-12-08) Cheema, Fasil Tariq; Urner, Ruth
Generative modelling has seen enormous practical advances over the past few years from LLMs like ChatGPT to image generation. However, evaluating the quality of a generative system is often still based on subjective human inspection. To overcome this, very recently, the research community has turned to exploring formal evaluation metrics and methods. In this work, we propose a novel evaluation method based on a two-way nearest neighbor test. We define a new measure of mutual coverage for two probability distributions. From this, we derive an empirical analogue and show analytically that it exhibits favorable theoretical properties while it is also straightforward to compute. We show that, while algorithmically simple, our derived method is also statistically sound. We complement our analysis with a systematic experimental evaluation and comparison to other recently proposed measures. Using a wide array of experiments, we demonstrate our algorithm’s strengths over other existing methods and confirm our results from the theoretical analysis.
Open Access
Investigating Calibrated Classification Scores through the Lens of Interpretability
(2023-12-08) Torabian, Alireza; Urner, Ruth
Calibration is a frequently invoked concept when useful label probability estimates are required on top of classification accuracy. A calibrated model is a scoring function whose scores correctly reflect underlying label probabilities. Calibration in itself however does not imply classification accuracy, nor human interpretable estimates, nor is it straightforward to verify calibration from finite data. There is a plethora of evaluation metrics (and loss functions) that each assesses a specific aspect of a calibration model. In this work, we initiate an axiomatic study of the notion of calibration and evaluation measures for calibration. We catalogue desirable properties of calibration models as well as evaluation metrics and analyze their feasibility and correspondences. We complement this analysis with an empirical evaluation, comparing two metrics and comparing common calibration methods to employing a simple, interpretable decision tree.
Open Access
Leveraging Deep Learning for Trajectory Similarity Learning and Trajectory Pathlet Dictionary Construction
(2023-12-08) Alix, Gian Carlo Idris; Papangelis, Emmanouil
The rapid development of geospatial technologies and location-based devices have motivated the research community of trajectory data mining, due to numerous applications including route planning and navigation services. Of interest are similarity search tasks that several works addressed through representation learning. Our method ST2Box offers refined representations by first representing trajectories as sets of roads, then adapting set-to-box architectures for learning accurate, versatile, and generalizable set representations of trajectories for preserving similarity. Experimentally, ST2Box outperforms baselines by up to ~38%. Another related problem involves constructing small sets of building blocks that can represent wide-ranging trajectories (pathlet dictionaries). However, currently-existing methods in constructing PDs are memory-intensive. Thus, we propose PathletRL for generating dictionaries that offer significant memory-savings. It initializes unit-length pathlets and iteratively merges them while maximizing utility -- that is approximated using deep reinforcement learning-based method. Empirically, PathletRL can reduce its dictionary's size by up to 65.8% against state-of-the-art methods.
Open Access
Chart Question Answering with an Universal Vision-Language Pretraining Approach
(2023-12-08) Parsa Kavehzadeh; Enamul Hoque Prince
Charts are widely used for data analysis, providing visual representations and insights into complex data. To facilitate chart-based data analysis using natural language, several downstream tasks have been introduced recently including chart question answering. However, existing methods for these tasks often rely on pretraining on language or vision-language tasks, neglecting the explicit modeling of chart structures. To address this, we first build a large corpus of charts covering diverse topics and visual styles. We then present UniChart, a pretrained model for chart comprehension and reasoning. We propose several chart-specific pretraining tasks that include: (i) low-level tasks to extract the visual elements (e.g., bars, lines) and data from charts, and (ii) high-level tasks to acquire chart understanding and reasoning skills. Our experiments demonstrate that pretraining UniChart on a large corpus with chart-specific objectives, followed by fine-tuning, yields state-of-the-art performance on four downstream tasks. Moreover, our model exhibits superior generalizability to unseen chart corpus, surpassing previous approaches that lack chart-specific objectives and utilize limited chart resources.
Open Access
Evaluating Temporal Queries over Videos
(2023-12-08) Chen, Yueting; Yu, Xiaohui
Videos have been an important part of people's daily lives and are continuously growing in terms of volume, size, and variety of content. Recent advances in Computer Vision (CV) algorithms have improved accuracy and efficiency, making video annotations possible with high accuracy. In this work, we follow a general framework to first obtain annotations utilizing state-of-the-art CV algorithms, and then consider three research problems on evaluating temporal queries with such annotations. Specifically, we first investigate the temporal queries that consider only co-occurrence relationships between objects on video feeds, where we take the first step and define such queries in a way that they incorporate certain physical aspects of video capture such as object occlusion. We propose two techniques, Marked Frame Set (MFS) and Sparse State Graph (SSG), to organize all detected objects in the intermediate data generation layer, which effectively, given the queries, minimizes the number of objects and frames that have to be considered during query evaluation. Then, we consider the query with a ranking mechanism that aims to retrieve clips from large video repositories in which objects co-occur in a query-specified fashion. We propose a two-phased approach, where we build indexes during the Ingestion Phase, and then answer queries during the Query Phase using the Partition-Based Query Processing (PBQP) algorithm, which efficiently produces the desired (query-specified) number of results with the highest scores. Finally, we further consider both spatial and temporal information with graph representations and define the problem of Spatial and Temporal Constrained Ranked Retrieval (STAR Retrieval) over videos. Based on the graph representation, we propose a two-phase approach, consisting of the ingestion phase, where we construct and materialize the Graph Index (GI), and the query phase, where we compute the top-ranked windows (video clips) according to the window matching score efficiently. We propose two algorithms to perform Spatial Matching (SMA) and Temporal Matching (TM) separately with an early-stopping mechanism. We present the details of the above three research problems and our proposed methods. Via experiments conducted on various datasets, we show the effectiveness of our proposed methods.
Open Access
Fine Granularity is Critical for Intelligent Neural Network Pruning
(2023-12-08) Heyman, Andrew Baldwin; Zylberberg, Joel
Neural network pruning is a popular approach to reducing the computational costs of training and/or deploying a network, and aims to do so while minimizing accuracy loss. Pruning methods that remove individual weights (fine granularity) yield better ratios of accuracy to parameter count, while methods that preserve some or all of a network’s structure (coarser granularity, e.g. pruning channels from a CNN) take better advantage of hardware and software optimized for dense matrix computations. We compare intelligent iterative pruning using several different criteria sampled from the literature against random pruning at initialization across multiple granularities on two different image classification architectures and tasks. We find that the advantage of intelligent pruning (with any criterion) over random pruning decreases dramatically as granularity becomes coarser. Our results suggest that, compared to coarse pruning, fine pruning combined with efficient implementation of the resulting networks is a more promising direction for improving accuracy-to-cost ratios.
Open Access
A 360-degree Omnidirectional Photometer Using a Ricoh Theta Z1
(2023-12-08) MacPherson, Ian Michael; Brown, Michael S.
Spot photometers measure the luminance emitted or reflected from a small surface area in a physical environment. Because the measurement is limited to a "spot," capturing dense luminance readings for an entire environment is impractical. This thesis demonstrates the potential of using an off-the-shelf commercial camera to operate as a 360-degree luminance meter. The method uses the Ricoh Theta Z1 camera, which provides a full 360-degree omnidirectional field of view and an API to access the camera's minimally processed RAW images. Working from the RAW images, this thesis describes a calibration method to map the RAW images under different exposures and ISO settings to luminance values. By combining the calibrated sensor with multi-exposure high-dynamic-range imaging, a cost-effective mechanism for capturing dense luminance maps of environments is provided. The results show that the Ricoh Theta calibrated as a luminance meter performs well when validated against a significantly more expensive spot photometer.
Open Access
Query-Aware Data Systems Tuning via Machine Learning
(2023-12-08) Henderson, Connor Dustin; Szlichta, Jarek
Modern data systems have hundreds of system configuration parameters which heavily influence the performance of business queries. Manual configuration by experts is painstaking and time consuming. We propose a query-informed tuning system called BLUTune which uses deep reinforcement learning based on advantage actor-critic neural networks to tune configurations within defined resource constraints. We translate high-dimensional query execution plans into a low-dimensional embedding space and illustrate the usefulness of query embeddings for the downstream task of data systems tuning. We train our model based on the estimated cost of queries then fine-tune it using query execution times. We present an experimental study over various synthetic and real-world workloads. One model uses TPC-DS queries such that there are tables from the schema that are not seen during training time. The second is trained under resource constraints to show how the model performs when we limit the memory the system has access to.