Information Systems and Technology

Permanent URI for this collection

https://hdl.handle.net/10315/27588

Browse

Now showing 1 - 20 of 62

Open Access
A Hierarchical Rule-Based Security Management System for Date-Intensive Applications
(2018-11-21) Rouf, Yar Akhter; Litoiu, Marin
Applications in today's software development environment evolve at a rapid rate, constantly providing their users with new functionalities. As a result, it becomes increasingly complex to understand the entire application. The security team and the developers may not completely understand each others approaches, resulting in a less secure system with vulnerabilities. In addition, there is large amount of security data to be analyzed. To mitigate these issues, we propose a platform to support the SecDevOps framework, a hierarchical distributed architecture for security control that uses a Business Rules Engine (BRE). The BRE simplifies security rules by allowing the teams to write them at an operational level rather than at the network level, which requires specialized knowledge. Business rules are universally understood by the different teams, resulting in effective inter-team communication. Additionally, the platform can expand and scale with new security rules and data sources at runtime in a systematic manner.
Open Access
A Hybrid Approach for Large-Scale Product Categorization Based on Weighted KNN and LSTM-BPV
(2019-12-04) Hu, Haohao; Huang, Xiangji
In modern e-commerce systems, large volumes of new items are being added to the product list everyday, which calls for automatic product categorization. In this thesis we propose a weighted K-Nearest Neighbour (KNN) based classification system for solving large-scale e-commerce product taxonomy classification problem. We use information retrieval (IR) model as similarity function in our weighted KNN algorithm. Among all IR models used in this study, we achieved highest classification performance through using information-based (IB) model as similarity function in the KNN algorithm. Moreover, our proposed method can improve the overall performance when combining prediction results with those from advanced neural network based method, namely Long Short-Term Memory with Balanced Pooling Views (LSTM-BPV). The hybrid system could achieve results comparable to the state of the art (SotA). We also get good results by fine-tuning pre-trained Bidirectional Encoder Representations from Transformers (BERT) model.
Open Access
A Methodology for Eliciting and Ranking Control Points for Adaptive Systems
(2015-08-28) Zoghi, Parisa; Litoiu, Marin
Designing an adaptive system to meet its quality constraints in the face of environmental uncertainties, such as variable demands, can be a challenging task. In cloud environment, a designer has to also consider and evaluate different control points, i.e., those variables that affect the quality of the software system. This thesis presents a method for eliciting, evaluating and ranking control points for web applications deployed in cloud environments. The proposed method consists of several phases that take a high-level stakeholders' adaptation goal and transform it into lower level MAPE-K loop control points. The MAPE-K loop is then activated at runtime using an adaptation algorithm. We conducted several experiments to evaluate the different phases of the methodology and we report the results and the lesson learnt.
Open Access
A Natural Language Question Answering System for Exploring Online Conversations
(2021-03-08) Siddiqui, Nadia Ashfaq; Prince, Enamul Hoque
The proliferation of social media has resulted in the exponential growth of on- line conversations. Due to the volume and complexity of conversations, it is often extremely difficult to gain insights from such conversations. This dissertation hy-pothesizes that synergetic integration of natural language processing with informa-tion visualization techniques can help users to better fulfill their information needs. More specifically, we developed a question-answering method that allows the user to ask questions about a conversation and then automatically answers the question by highlighting results in a visual interface. The visual interface, named ConVisQA, was developed by extending ConVis which visually summarizes a conversation by providing an overview of topics and sentiment information. We demonstrate the effectiveness of our approach through a user study with blog readers. The dis-sertation concludes with a user study comparing our interface with a traditional interface for blog reading as well as considerations for future work.
Open Access
Adaptive Mechanisms for Mobile Spatio-Temporal Applications
(2014-07-09) Theodorou, Vasileios; Litoiu, Marin
Mobile spatio-temporal applications play a key role in many mission critical fields, including Business Intelligence, Traffic Management and Disaster Management. They are characterized by high data volume, velocity and large and variable number of mobile users. The design and implementation of these applications should not only consider this variablility, but also support other quality requirements such as performance and cost. In this thesis we propose an architecture for mobile spatio-temporal applications, which enables multiple angles of adaptivity. We also introduce a two-level adaptation mechanism that ensures system performance while facilitating scalability and context-aware adaptivity. We validate the architecture and adaptation mechanisms by implementing a road quality assessment mobile application as a use case and by performing a series of experiments on cloud environment. We show that our proposed architecture can adapt at runtime and maintain service level objectives while offering cost-efficiency and robustness.
Open Access
Advancement of Data-Driven Short-Term Flood Predictions on an Urbanized Watershed Using Preprocessing Techniques
(2018-11-21) Zistler, Marina; Erechtchoukova, Marina G.
Supervised classification can be applied for short-term predictions of hydrological events in cases where the label of the event rather than its magnitude is crucial, as in the case of early flood warning systems. To be effective, these warning systems must be able to forecast floods accurately and to provide estimates early enough. Following the approach of transforming hydrological sensor data into a phase space using time-delay embedding, an attempt was made to improve the performance of the models and to increase the lead-time of reliable predictions. For this, the available set of attributes supplied by stream and rain gauges was extended by derivatives. In addition, imbalanced data techniques were applied at the data preprocessing step. The computational experiments were conducted on various data sets, lead-times, and years with different hydrological characteristics. The results show that especially derivatives of water level data improve model performance, increasingly when added for only one or two hours before the prediction time. In addition to that, the imbalanced data techniques allowed for overall improved prediction of floods at the cost of slight increase of misclassification of low-flow events.
Open Access
An Adaptive Architecture for Internet of Things Applications
(2019-03-05) Ramprasad, Brian Annil; Litoiu, Marin
The number of IoT devices has been growing exponentially as new products are being developed and legacy systems are becoming Internet enabled. As a consequence of this trend, the large amounts of traffic generated by IoT devices require new approaches to platform design and workload management. The primary challenge with managing IoT devices is that the traffic can be highly variable due to the type and the time of use. To be able to maintain quality of service standards on an IoT platform, these traffic patterns need to be modeled and understood so we can adapt the architecture dynamically. To address these challenges, we propose an adaptable architecture, a platform to emulate IoT devices, and a smart testing framework to detect bottlenecks that can predict the demand for computing resources. We show that in certain cases we can predict the demand for computing resources with a high degree of accuracy.
Open Access
An Approach to Designing Clusters for Large Data Processing
(2015-08-28) Sandel, Roni; Litoiu, Marin
Cloud computing is increasingly being adopted due to its cost savings and abilities to scale. As data continues to grow rapidly, an increasing amount of institutions are adopting non standard SQL clusters to address the storage and processing demands of large data. However, evaluating and modelling non SQL clusters presents many challenges. In order to address some of these challenges, this thesis proposes a methodology for designing and modelling large scale processing configurations that respond to the end user requirements. Firstly, goals are established for the big data cluster. In this thesis, we use performance and cost as our goals. Secondly, the data is transformed from relational data schema to an appropriate HBase schema. In the third step, we iteratively deploy different clusters. We then model the clusters and evaluate different topologies (size of instances, number of instances, number of clusters, etc.). We use HBase as the large data processing cluster and we evaluate our methodology on traffic data from a large city and on a distributed community cloud infrastructure.
Open Access
An Empirical Study on the Role of Requirement Engineering in Agile Method and Its Impact on Quality
(2015-08-28) Rahman, Anzira; Cysneiros, Luiz Marcio
Agile Methods are characterized as flexible and easily adaptable. The need to keep up with multiple high-priority projects and shorter time-to-market demands could explain their increasing popularity. It also raises concerns of whether or not use of these methods jeopardizes quality. Since Agile methods allow for changes throughout the process, they also create probabilities to impact software quality at any time. This thesis examines the process of requirement engineering as performed with Agile method in terms of its similarities and differences to requirement engineering as performed with the more traditional Waterfall method. It compares both approaches from a software quality perspective using a case study of 16 software projects. The main contribution of this work is to bring empirical evidence from real life cases that illustrate how Agile methods significantly impacts software quality, including the potential for a larger number of defects due to poor non-functional requirements elicitation.
Open Access
An Initial Analysis on the Impact of Software Transparency and Privacy on a Healthcare Environment
(2016-09-20) Zinovatna, Olena; Cysneiros, Luiz Marcio
Transparency and privacy are two fundamental parts of any democratic society. Although both transparency and privacy are essential in todays environment they are often conflicting. Allowing more transparency is likely to impact privacy, likewise, preserving privacy often reduces transparency. With consistently evolving nature of information technology and a tremendous amount of data being generated on a daily basis, there is a growing need to balance privacy and transparency in order to exist in the fast paced environment. The purpose of this work is to understand the current state of software transparency and privacy as well as how it is being perceived in the workplace. This thesis focuses on the following three objectives. First, it supports the development of the catalogues documenting all existing privacy concerns and how they relate to transparency. Second, it narrows down its focus to a healthcare domain. Lastly, it evaluates current state of software transparency in existing health information systems.
Open Access
Automatic Image Recognition of Rapid Malaria Emergency Diagnosis: A Deep Neural Network Approach
(2018-03-01) Liang, Zhaohui; Huang, Xiangji
Deep learning is the state-of-the-art artificial intelligence (AI) method for visual pattern detection and automated diagnosis. This paper describes the application of convolutional neural network (CNN), the deep learning model for visual recognition, to automatic detection of plasmodium parasitized red blood cells for malaria field screening and rapid diagnosis. The malaria thin blood smears are from Bangladesh and initially labeled by a specialist. 27,578 red blood cell images are segmented (raw set). The images are rotated clockwise three times to generate an augmented dataset with 110,312 red blood cell images. A 12-layer and an 18-layer CNN-based Malaria Net models are applied to classify both the raw data set and the augmented dataset. The performance is evaluated by ten-fold cross-validation and compared to a transfer learning model. In the ten-fold cross-validation test for Malaria Net, the average accuracy is 97.37% (18-layer) and 96.09% (12-layer) with the raw set, and is 97.93% and 96.75% with the augmented set, in comparison to 91.99% with the raw set and 94.26% with the augmented set in transfer learning. In addition, the two CNN models show superiority over transfer learning in all performance indicators such as sensitivity, specificity, precision, F1 score, and Matthews correlation coefficient. The Malaria Net can accurately detect malaria-infected red blood cells. A CNN model trained by domain-specific data shows superior performance over the transfer-learning method. Automatic image classification powered by deep learning offers not only an accurate method for the malaria field screening and rapid diagnosis but also a new solution for malaria control especially in resource-poor regions.
Open Access
Automating Software Customization via Crowdsourcing using Association Rule Mining and Markov Decision Processes
(2015-01-26) Hamidi, Saeideh; Liaskos, Sotirios
As systems grow in size and complexity so do their configuration possibilities. Users of modern systems are easy to be confused and overwhelmed by the amount of choices they need to make in order to fit their systems to their exact needs. In this thesis, we propose a technique to select what information to elicit from the user so that the system can recommend the maximum number of personalized configuration items. Our method is based on constructing configuration elicitation dialogs through utilizing crowd wisdom. A set of configuration preferences in form of association rules is first mined from a crowd configuration data set. Possible configuration elicitation dialogs are then modeled through a Markov Decision Processes (MDPs). Within the model, association rules are used to automatically infer configuration decisions based on knowledge already elicited earlier in the dialog. This way, an MDP solver can search for elicitation strategies which maximize the expected amount of automated decisions, reducing thereby elicitation effort and increasing user confidence of the result. We conclude by reporting results of a case study in which this method is applied to the privacy configuration of Facebook.
Open Access
Comparative Analysis of Language Models on Augmented Low-Resource Datasets for Application in Question & Answering Systems
(2024-11-07) Ranjbargol, Seyedehsamaneh; Erechtchoukova, Marina G.
This thesis aims to advance natural language processing (NLP) in question-answering (QA) systems for low-resource domains. The research presents a comparative analysis of several pre-trained language models, highlighting their performance enhancements when fine-tuned with augmented data to address several critical questions, such as the effectiveness of synthetic data and the efficiency of data augmentation techniques for improving QA systems in specialized contexts. The study focuses on developing a hybrid QA framework that can be integrated with a cloud-based information system. This approach refines the functionality and applicability of QA systems, boosting their performance in low-resource settings by using targeted fine-tuning and advanced transformer models. The successful application of this method demonstrates the significant potential for specialized, AI-driven QA systems to adapt and thrive in specific environments.
Open Access
Comparative Analysis of Transformer-Based Language Models for Text Analysis in the Domain of Sustainable Development
(2023-08-04) Safwat, Nabil; Erechtchoukova, Marina G.
With advancements of Artificial Intelligence, Natural Language Processing (NLP) has gained a lot of attention because of its potential to facilitate complex human-machine interactions, enhance language-based applications, and automate processing of unstructured texts. The study investigates the transfer learning approach on Transformer-based Language models, abstractive text summarization approach, and their application to the domain of Sustainable Development with the goal to determine SDGs representation in scientific publications using the text summarization technique. To achieve this, the traditional transfer learning framework was expanded so that: (1) the relevance of textual documents to specified text can be evaluated, (2) neural language models, namely BART and T5, were selected, and (3) 8 text similarity measures were investigated to identify the most informative ones. Both the BART and T5 models were fine-tuned on an acquired domain-specific corpus of scientific publications extracted from Scopus Elsevier database. The relevance of recently published works to an SDG was determined by calculating semantic similarity scores between each model generated summary to the SDG’s description. The proposed framework made it possible to identify goals that dominated the developed corpus and those that require further attention of the research community.
Open Access
Comparing Representations of Contribution Labels in Goal Models
(2020-05-11) Tambosi, Wisal Yousef S.; Liaskos, Sotirios
Goal models have been proposed to be an effective method to support decision making in early requirements engineering. Key to using them is the concept of contribution links that represent how the satisfaction of one goal affects that of another. Multiple proposals have been offered for representing contribution; however, the degree to which users can intuitively understand the meaning behind contribution representations and utilize them appropriately has not been thoroughly studied. This work reports the results of an experimental study that compares the intuitiveness of two contribution representation approaches by measuring the performance of untrained users and exploring the role of individual differences (cognitive styles and arithmetic attitude and ability) in establishing the right intuition. Results show significant differences between the two representations as well as effects of various levels of individual factors. The results inspire further research on contribution links and support the operationalizability of intuitiveness as a criterion for evaluating conceptual modelling language designs.
Open Access
Data Analytics in Climate Change Studies
(2022-08-08) Bhardwaj, Eshta; Khaiter, Peter A.
The observed trends of climate change have wide ramifications to the sustainability of a normal lifestyle. The application of data analysis techniques can increase the knowledge around climate studies while introducing additional research methods. The proposed research showcases the development of a novel data analytics framework to address the gap in data modeling for the comparison of climate model data and observational data. A detailed data pipeline for data collection, extraction, wrangling, analysis, and visualization is discussed. The practical implementation of the framework is presented through a visualization tool, Weather Analysis Regional Model, to aid researchers and practitioners in their analysis of climate data.
Open Access
Data-Driven Causal Decision Support for Business Process Management
(2024-07-18) Jandaghi Alaee, Ali; Senderovich, Arik
Control-flow and resource assignment decisions influence business processes. Recorded process data can be used to identify which decisions are informed by data to predict their outcome, and to guide interventions as part of a what-if analysis. The latter requires causal models that explain decisions. Yet, existing methods are limited: they focus on control-flow decisions only, ignore potential confounders, and use ad-hoc methods to resolve causal conflicts. We fill this gap, by introducing a causal decision modeling framework which uncovers confounding effects, and captures resource decisions. Moreover, we provide a process-aware causal discovery algorithm that takes process precedence into account. In addition, we employ domain knowledge to include unobserved factors. We address the problem of identification, conduct interventional outcome prediction and improve decision-making by acquiring unavailable data to maximize the utility of interventions. We demonstrate the feasibility of our approach through a set of experiments on synthetically generated and real-world datasets.
Open Access
Deconstructing And Restyling SVG Charts Using Large Language Models
(2025-04-10) Zaidi, Syed Muhammad Ali Raza; Hoque Prince, Enamul
SVG charts are very common on the Web, however, reusing, editing and restyling these charts is very difficult. To facilitate this process, this thesis explores the challenges of extracting data and visual encodings from SVG chart images and restyling them based on user queries. We leverage large language models (LLMs) to facilitate this process using few-shot prompt approaches, enabling users to deconstruct and restyle existing Vega-Lite visualizations through natural language input. Our evaluation on 800 SVG charts and 250 natural language queries reveals that our system accurately deconstruct 93.4% charts and successfully restyled 38.6% queries. Finally, based on the above techniques, we develop a Chrome plugin tool that detects and deconstructs SVG charts from the web page and then restyles the charts based on user input.
Open Access
Design Approach for Building Technology with Indigenous Communities
(2022-08-08) Rizvi, Alina; Chen, Stephen
An increase in demand for mobile platforms in the last decade has led to a widespread need for platform development methods. While these standards do work well for a majority of mobile developers, one audience that can be neglected is the urban Indigenous population of youth in Toronto. Through experience, relationships and an understanding of significant cultural practices and teachings, this study proposes a unique mobile development approach. This approach is tailored specifically towards urban Indigenous youth in Toronto, incorporating the Anishinaabe Medicine Wheel, 7 Grandparent Teachings, and Sharing Circles as main influencers. It also features an experience report of how the mobile development approach worked in practice. Two mobile platforms were built using this approach and achieved successful results, with both becoming popular applications within their respective target audiences. This approach places a focus on the users and essentially aims to have the target audience be the main deciding factor in how the developed platform looks and functions. The motivation behind this study is to make technology less exclusive, and more accessible to a diverse population.
Open Access
Dynamic Elastic Provisioning For NFV-Enabled 5G Networks Using Machine Learning
(2023-03-28) Ali, Khalid; Jammal, Manar
5G networks are expected to support a variety of services and applications by having a more stringent latency, reliability, and bandwidth requirements compared to previous generations. To meet these requirements, Open Radio Access Networks (O-RAN) has been proposed. The O-RAN Alliance assumes O-RAN components to be Virtualized Network Functions (VNFs). Furthermore, O-RAN allows employing Machine Learning (ML) solutions to tackle challenges in resource management. However, intelligently managing resources for O-RAN can prove challenging. Network providers need to dynamically scale resources in response to incoming traffic. Elastically allocating resources provides higher flexibility, reduces OPerational EXpenditure (OPEX), and increases resource utilization. In this work, we propose and evaluate an elastic VNF orchestration framework for O-RAN. The proposed system consists of a traffic forecasting-based dynamic scaling scheme using ML, and a Reinforcement Learning (RL) based VNF placement policy. The models are evaluated based on their predictive capabilities subject to all Service-Level Agreements.