Information Systems and Technology

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 20 of 53
  • ItemOpen Access
    Data-Driven Causal Decision Support for Business Process Management
    (2024-07-18) Jandaghi Alaee, Ali; Senderovich, Arik
    Control-flow and resource assignment decisions influence business processes. Recorded process data can be used to identify which decisions are informed by data to predict their outcome, and to guide interventions as part of a what-if analysis. The latter requires causal models that explain decisions. Yet, existing methods are limited: they focus on control-flow decisions only, ignore potential confounders, and use ad-hoc methods to resolve causal conflicts. We fill this gap, by introducing a causal decision modeling framework which uncovers confounding effects, and captures resource decisions. Moreover, we provide a process-aware causal discovery algorithm that takes process precedence into account. In addition, we employ domain knowledge to include unobserved factors. We address the problem of identification, conduct interventional outcome prediction and improve decision-making by acquiring unavailable data to maximize the utility of interventions. We demonstrate the feasibility of our approach through a set of experiments on synthetically generated and real-world datasets.
  • ItemOpen Access
    Optimizing Data Compression via Data Reordering Strategies
    (2024-07-18) Du, Qinxin; Yu, Xiaohui
    To improve the efficiency and cost-effectiveness of handling large tabular datasets stored in databases, a range of data compression techniques are employed. Among these, dictionary-based compression methods such as Lz4, Gzip, and Zstandard are commonly utilized to decrease data size. However, while these traditional dictionary-based compression techniques can reduce data size to some degree, they are not able to identify the internal patterns within given datasets. Thus, there remains substantial potential for further data size reduction by identifying repetitive data patterns. This thesis proposes two novel approaches to improve tabular data compres- sion performance. Both methods involve data preprocessing using an advanced data encoding technique called locality-sensitive hashing (LSH). One approach utilizes clustering for data reordering, while the other employs a heuristic-based solver for the Travelling Salesman Problem (TSP). The data encoding process enables the identification of internal repetitive patterns within the original datasets. Records with similar features are grouped together and compressed into a much smaller size after reordering. Furthermore, a novel table partitioning strategy based on the number of distinct values in each column is designed to further improve the compression ratio of the entire table. Extensive experiments are then conducted on one synthetic dataset and three real datasets to evaluate the performance of the proposed algorithms by varying parameters of interest. The data encoding and reordering methods show significant efficiency improvements, resulting in reduced data size and substantially increased data compression ratios.
  • ItemOpen Access
    Revolutionizing Time Series Data Preprocessing with a Novel Cycling Layer in Self-Attention Mechanisms
    (2024-07-18) Chen, Jiyan; Yang, Zijiang
    This thesis presents a novel method for improving time series data preprocessing by incorporating a cycling layer into self-attention mechanisms. Traditional techniques often struggle to capture the cyclical nature of time series data, impacting predictive model accuracy. By integrating a cycling layer, this thesis aims to enhance the ability of models to recognize and utilize cyclical patterns within datasets, exemplified by the Jena Climate dataset from the Max Planck Institute for Biogeochemistry. Empirical results demonstrate that the proposed method not only improves the accuracy of forecasts but also increases model fitting speed compared to conventional approaches. This thesis contributes to the advancement of time series analysis by offering a more effective preprocessing technique.
  • ItemOpen Access
    Integrating Natural Language and Visualizations for Exploring Data on Smartwatch
    (2024-07-18) Varadarajan, Kaavya; Prince, Enamul Hoque
    Smartwatches are increasingly popular for collecting and exploring personal data, including health, stocks, and weather information. However, the use of micro-visualizations to present such data faces challenges due to limited screen size and interactivity. To address this problem, we propose integrating natural language (voice) with micro-visualizations (charts) to enhance user comprehension and insights. Leveraging a large language model like ChatGPT, we automatically summarize micro-visualizations and combine them with audio narrations and interactive visualizations to aid users in understanding the data. A user study with sixteen participants suggests that the combination of voice and charts results in superior accuracy, preference, and usefulness compared to presenting charts alone. This highlights the efficacy of integrating natural language with visualizations on smartwatches to improve user interaction and data comprehension.
  • ItemOpen Access
    Machine learning algorithms for Long COVID effects detection
    (2024-03-16) Ahuja, Harit; Litoiu, Marin; Sergio, Lauren
    In the realm of the Internet of Things (IoT) and Machine learning (ML), there is a growing demand for applications that can improve healthcare. By integrating sensors, cloud computing and ML we can create a powerful platform that enables insights into healthcare. Building upon these concepts, we propose a novel approach to address the widespread problem of long COVID. We utilize a wearable device to capture electroencephalogram (EEG) readings, which are then transformed through a set of processing steps into actionable decisions. We use a methodology that initiates data collection from a Cognitive-Motor Integration (CMI) task, followed by data preprocessing, feature engineering, and then the application of ML and advanced Deep Learning (DL) algorithms. To address challenges like data scarcity and privacy concerns, we generate synthetic data and train them using the same model as the original data for comparative analysis. Our method was tested on real cases and achieved prominent results: the CNN-LSTM model achieved 83% accuracy with original data and surged to 93% using synthetic data.
  • ItemOpen Access
    Unveiling the Complexities of Student Satisfaction in E-learning: An Integrated Framework for the Context of COVID-19
    (2024-03-16) Lin, Rui; Huang, Jimmy
    Amidst the global pandemic’s reshaping of education, our study investigates e-learning dynamics in Canadian higher education. Integrating the Technology Acceptance Model (TAM), the DeLone and McLean Information Systems Success Model (D&M ISS), and the Expectation Confirmation Model (ECM), we introduce the innovative C-RES framework. This framework, which stands for COVID-19 Remote E-learning System, uniquely addresses the complexities of e-learning systems and their role in student satisfaction during COVID-19. Through Structural Equation Modeling (SEM) analysis of responses from a diverse pool of graduate students across Canada, we uncover relationships among psycho- logical factors, quality dimensions, and social influences. We demonstrate how self-efficacy, IT anxiety, and perceived system and information quality significantly influence students’ ease of use and usefulness perceptions, impacting their satisfaction and commitment to Learning Management Systems (LMS). Our findings reveal that e-learning quality lies not only in technology but also in content, and highlight the significant influence of individual confidence and community dynamics on student experiences. These insights provide actionable strategies for enhancing the effectiveness and resilience of e-learning systems, especially in crises. While focusing on the Canadian pandemic context, our research suggests exploring demographic influences in future studies. This thesis serves as a foundation for future e-learning explorations, pushing educational technology boundaries during global disruptions and offering key strategies for resilience and effectiveness in higher education.
  • ItemOpen Access
    Exploratory Analysis of Water Quality in a Small Urbanized Watershed Using Deep Learning
    (2023-12-08) Ofosu, Alfred; Erechtchoukova, Marina G.
    Water is a life-sustaining resource for living organisms inside and outside water bodies. Natural waters serve as municipal and industrial water supplies, sources for agricultural irrigation, homes for aquatic ecosystems, recreation, and other essential uses. The quality of water determines its use. Therefore, it must be monitored, managed, and reported to help stakeholders in decision-making that can protect watershed ecosystems and improve measures to mitigate factors adversely affecting water bodies. Water quality is represented by a set of parameters that describe specific characteristics or properties of water. These parameters are determined by measuring water's physical and chemical characteristics and concentration levels of various substances in a water column with subsequent sample analysis in laboratories. This results in low frequencies of observations for water quality parameters compared to hydrometric and meteorological data. Frequencies of observation adopted by many water quality monitoring systems vary between 4 and 12 samples per year, suggesting applying modelling techniques to support decision-making. The study aims to develop a data-driven computational tool for water quality modelling in a small, highly urbanized watershed of the Don River, Ontario, Canada. The study focuses on major ions, namely, cations: calcium (Ca2+), magnesium (Mg2+), sodium (Na+), and potassium (K+), and anions such as bicarbonate (HCO3-), carbonate (CO32-), chloride (Cl-), and sulphate (SO42-). These parameters are not affected significantly by the aquatic ecosystem. The hydrological and meteorological processes mainly determine their dynamics. The study uses data from different monitoring systems belonging to the Toronto and Region Conservation Authority (TRCA) and Environment and Climate Change Canada (ECCC). It consists of water quality parameters and hydrometric and meteorological characteristics observed in the watershed over 57 years. Concentrations of selected water quality parameters are modelled using deep neural networks. The data pre-processing framework for cleansing and integrating data observed at different frequencies from different locations is developed. The framework is applied for the comparative analysis of neural networks of various configurations. Two sets of computational experiments were conducted. In the first set of experiments, integrated data from all monitoring stations in the watershed was fed into the deep learning algorithms to train a neural network to predict the concentration of major ions for the upcoming month (t+1). The second set of experiments uses upstream environmental parameters to train the model and predict the major ion concentrations in the lower subwatershed. The study investigates the performance of developed models in accurately predicting ion concentrations and provides insights into the relationship between environmental factors and water quality in the investigated watershed. The findings have practical applications for water resource management and pollution prevention efforts.
  • ItemOpen Access
    Enhancing General Language Models for Biomedical Test Retrieval via Diversified Prior Knowledge
    (2023-12-08) Huang, Yizheng; Huang, Jimmy
    The thesis introduces the Diversified Prior Knowledge Enhanced General Language Model (DPK-GLM) to improve the efficacy of general language models in biomedical Information Retrieval (IR). General language models often struggle with biomedical data due to its specialized terminology and the need for precise matching. DPK-GLM tackles these challenges by integrating domain-specific knowledge, thereby enhancing the model's ability to understand and process biomedical information. The framework comprises three core components. The first, Knowledge-based Query Expansion, leverages authoritative biomedical databases to enrich search queries with domain-specific entities. The second, Aspect-based Filter, identifies documents that are highly relevant to the query. The third, Diversity-based Score Reweighting, re-ranks these filtered documents by combining similarity and diversity scores, yielding more accurate results. Experimental tests on public biomedical IR datasets confirm that DPK-GLM significantly improves retrieval performance.
  • ItemOpen Access
    Using Data Analytics and Machine Learning in Sustainable Forest Management from Remote Sensing Data
    (2023-08-04) Sysoeva, Polina; Khaiter, Peter A.
    Nowadays, remote sensing has become a widely used technique to acquire data for ecosystem service assessment (ESA) and other sustainable management practices. Remotely Sensed Data (RSD) is particularly crucial in locations where in situ observations are either limited or completely impossible due to their inaccessibility, such as mountainous areas. However, due to the unique features of the RSD, obtaining substantial insights requires specific preprocessing steps and strong computational algorithms, such as machine learning (ML). In the research, we present a methodology integrating RSD with data analytic and machine learning techniques for the needs of ESA. A pipeline for preprocessing EOS data, transforming into features, and experimenting with tuning of the ML algorithms is developed. A practical application of the proposed approach is demonstrated through assessing the impact of extreme weather events on forest ecosystems and their carbon sequestration abilities in two areas of the Kashmir Valley, Jammu & Kashmir, India.
  • ItemOpen Access
    Comparative Analysis of Transformer-Based Language Models for Text Analysis in the Domain of Sustainable Development
    (2023-08-04) Safwat, Nabil; Erechtchoukova, Marina G.
    With advancements of Artificial Intelligence, Natural Language Processing (NLP) has gained a lot of attention because of its potential to facilitate complex human-machine interactions, enhance language-based applications, and automate processing of unstructured texts. The study investigates the transfer learning approach on Transformer-based Language models, abstractive text summarization approach, and their application to the domain of Sustainable Development with the goal to determine SDGs representation in scientific publications using the text summarization technique. To achieve this, the traditional transfer learning framework was expanded so that: (1) the relevance of textual documents to specified text can be evaluated, (2) neural language models, namely BART and T5, were selected, and (3) 8 text similarity measures were investigated to identify the most informative ones. Both the BART and T5 models were fine-tuned on an acquired domain-specific corpus of scientific publications extracted from Scopus Elsevier database. The relevance of recently published works to an SDG was determined by calculating semantic similarity scores between each model generated summary to the SDG’s description. The proposed framework made it possible to identify goals that dominated the developed corpus and those that require further attention of the research community.
  • ItemOpen Access
    Dynamic Elastic Provisioning For NFV-Enabled 5G Networks Using Machine Learning
    (2023-03-28) Ali, Khalid; Jammal, Manar
    5G networks are expected to support a variety of services and applications by having a more stringent latency, reliability, and bandwidth requirements compared to previous generations. To meet these requirements, Open Radio Access Networks (O-RAN) has been proposed. The O-RAN Alliance assumes O-RAN components to be Virtualized Network Functions (VNFs). Furthermore, O-RAN allows employing Machine Learning (ML) solutions to tackle challenges in resource management. However, intelligently managing resources for O-RAN can prove challenging. Network providers need to dynamically scale resources in response to incoming traffic. Elastically allocating resources provides higher flexibility, reduces OPerational EXpenditure (OPEX), and increases resource utilization. In this work, we propose and evaluate an elastic VNF orchestration framework for O-RAN. The proposed system consists of a traffic forecasting-based dynamic scaling scheme using ML, and a Reinforcement Learning (RL) based VNF placement policy. The models are evaluated based on their predictive capabilities subject to all Service-Level Agreements.
  • ItemOpen Access
    Integrating Precipitation Nowcasting in a Deep Learning-Based Flash Flood Prediction Framework and Assessing the Impact of Rainfall Forecasts Uncertainties
    (2022-12-14) Mhedhbi, Rim; Erechtchoukova, Marina G.
    Flash floods are among the most immediate and destructive natural hazards. To issue warnings on time, various attempts were made to extend the forecast horizon of flash floods prediction models. Particularly, introducing rainfall forecast into process-based hydrological models was found effective. However, integrating precipitation predictions into flash flood data-driven models has not been addressed yet. In this endeavor, we propose a modeling framework that integrates rainfall nowcasts and assesses the impact of rainfall predictions uncertainties on a Deep Learning-based flash flood prediction model. Compared to the Persistence and ARIMA models, the LSTM model provided better rainfall nowcasting performance. Further, we proposed an Encoder-Decoder LSTM-based model architecture for short-term flash flood prediction that supports rainfall forecasts. Computational experiments showed that future rainfall values improved flash floods’ predictability for extended lead times. We also found that rainfall underestimation had a significant adverse effect on the model’s performance compared to rainfall overestimation.
  • ItemOpen Access
    Design Approach for Building Technology with Indigenous Communities
    (2022-08-08) Rizvi, Alina; Chen, Stephen
    An increase in demand for mobile platforms in the last decade has led to a widespread need for platform development methods. While these standards do work well for a majority of mobile developers, one audience that can be neglected is the urban Indigenous population of youth in Toronto. Through experience, relationships and an understanding of significant cultural practices and teachings, this study proposes a unique mobile development approach. This approach is tailored specifically towards urban Indigenous youth in Toronto, incorporating the Anishinaabe Medicine Wheel, 7 Grandparent Teachings, and Sharing Circles as main influencers. It also features an experience report of how the mobile development approach worked in practice. Two mobile platforms were built using this approach and achieved successful results, with both becoming popular applications within their respective target audiences. This approach places a focus on the users and essentially aims to have the target audience be the main deciding factor in how the developed platform looks and functions. The motivation behind this study is to make technology less exclusive, and more accessible to a diverse population.
  • ItemOpen Access
    Implementing Security Requirements through Automatic Generation of Secure Workflows
    (2022-08-08) Jaouhar, Ibrahim; Liaskos, Sotirios
    Modern software-intensive information systems are enormously large and complex. Prior to the design process of such systems, designers and architects need to know what kinds of stakeholder needs the system is supposed to support. This is particularly true for security requirements which must be captured and analyzed alongside all other requirements rather than treated as an afterthought. Hence, many researchers have proposed different modelling frameworks in different domain fields to address security and privacy patterns. However, most of these frameworks focus on comprehensive representation and analysis of requirements, without indicating how such requirements can be implemented within the context of a business process. Users are often at loss with regards to what security technologies they should adopt and incorporate in their workflows to reach secure business processes. In this thesis, we propose a framework for enriching goal-oriented requirements models with security controls necessitated by specified security requirements. A set of patterns are designed by security experts that associate abstract domain-independent user goals/tasks with alternative workflows that achieve those goals with various levels of security. Such translation of information is performed with the aid of an AI planner, SHOP2. Consequently, system analysts with no deep experience in security technologies can acquire a view of what steps and technologies are involved in making their designs more secure and implement accordingly.
  • ItemOpen Access
    Data Analytics in Climate Change Studies
    (2022-08-08) Bhardwaj, Eshta; Khaiter, Peter A.
    The observed trends of climate change have wide ramifications to the sustainability of a normal lifestyle. The application of data analysis techniques can increase the knowledge around climate studies while introducing additional research methods. The proposed research showcases the development of a novel data analytics framework to address the gap in data modeling for the comparison of climate model data and observational data. A detailed data pipeline for data collection, extraction, wrangling, analysis, and visualization is discussed. The practical implementation of the framework is presented through a visualization tool, Weather Analysis Regional Model, to aid researchers and practitioners in their analysis of climate data.
  • ItemOpen Access
    Exploring the Effect of User Characteristics in Word Cloud Visualizations
    (2022-08-08) Shirin, Zehra; Hoque Prince, Enamul
    Word clouds are very popular for visually summarizing texts. While word clouds usually show the frequency of words using font size, recent studies have explored other possible design elements. However, there is still a gap in terms of understanding how individual differences among users may impact their performance. This thesis aims to bridge this gap by answering two key research questions: What user characteristics are impacted by different variations of word clouds, and how word clouds could be adapted to different user characteristics. To answer these questions, we ran a user study where participants performed perceptual speed and verbal working memory tests followed by 36 trials of the magnitude judgement task for word clouds. Results showed that user characteristics like perceptual speed can significantly impact the performance of users. These results can be useful in the future to provide personalized word clouds that are suitable for people with different user characteristics.
  • ItemOpen Access
    Using Data Mining Techniques to Assess the Impact of COVID-19 on the Auto Insurance Industry in China
    (2022-03-03) Wang, Jiangshan; Zhu, Huaiping
    Since coronavirus disease 2019 (COVID-19) was discovered at the end of 2019, the whole world has been severely affected. The insurance industry, regarded as an important factor in recovery, has also been affected by COVID-19. However, effective data mining techniques have rarely been utilized in the insurance industry in China, especially under the circumstances of COVID-19. Although some traditional statistical analysis methods have been applied to this area, the limitation of the lack of data distribution still cannot be efficiently overcome. With the machine learning technique proposed in this thesis, this limitation can be solved by using a stacking model with great generalization ability. In this research, the ElasticNet, LightGBM, and Random Forest approaches were employed as base learners; ridge and LASSO regression were used as meta-models to increase the prediction accuracy; and the SHAP value was utilized to explain the impact of COVID-19 on the insurance industry in China. The stacking meta-model in this thesis has a mean absolute percentage error (MAPE) of 12.57134, whereas the average value in the past week is 21.50972, and the MAPE of ElasticNet is 22.57935. In conclusion, COVID-19 affects the auto insurance industry in China.
  • ItemOpen Access
    Factorized Construction of Machine Learning Methods over Normalized Data
    (2021-11-15) Zhang, Zhe; Yu, Xiaohui
    Enterprises are adopting machine learning to gain knowledge from the vast amount of data, which are normalized and stored in relational databases. All the features required in different relations must be combined through join operations and fed to machine learning processes. As a result, redundancy avoided by normalization is reintroduced, which incurs additional costs. This thesis proposes the factorized algorithms (F-GMM, F-NN and F-PPCA) for three widely used scenarios (GMM, NN and PPCA) in machine learning to eliminate the redundancy introduced by the joins. The training process can be conducted much faster without any loss in accuracy for the exact decomposition. The efficiency improvement depends on the relative redundancy of the original relations. Finally, we design extensive experiments on both synthetic and real datasets to evaluate the performance of the proposed algorithms by varying parameters of interest. The factorized method yields significant efficiency improvements, which increases with redundancy growth.
  • ItemOpen Access
    Exploring Topic Modeling in The Domain of Integrated Water Resource Management
    (2021-11-15) Kohli, Akshay Kumar; Erechtchoukova, Marina G.
    To successfully achieve the United Nations Sustainable Development goals, policy and decision making should include Integrated Environmental Assessment (IEA). Water resources and there utilization play an important role in achieving these goals at all levels from global to local. Sustainability of a water resource is of paramount importance for achieving United Nations long-term development goals. Sustainability of a resource is governed by the interplay of inner natural processes, biological, economical and social systems, making management of a water resource a complex multidisciplinary problem which can be solved only by combining various approaches. The thesis explored application of text mining techniques, namely, topic modelling, to scientific publications in the sustainable water resource management domain with the goal to identify major research questions, practical problems and methodological approaches used to address these problems. Comparative analysis of approaches to building corpora and model performance evaluations were conducted.
  • ItemOpen Access
    Privacy-Preserving Edge-Cloud Architecture for IoT Healthcare Systems
    (2021-11-15) Goyal, Payal; Litoiu, Marin
    With the surging demand for Internet of Things (IoT) healthcare applications, a myriad of data privacy concerns come to light. Cloud computing inherits the risks of exposing data to re-identification vulnerabilities. A secure solution is storing and processing data locally on edge, but it lacks the provision of powerful machine learning (ML) needs. An improved computing framework is required to incorporate ML capabilities and user-data confidentiality. We perform a systematic study of IoT healthcare systems and propose a three-tier architecture that protects and enables data sharing. The edge anonymizes data using differential privacy (DP); transmits it to the cloud to train ML classifier; sent back trained classifier to edge to make inferences. Our findings show 1) XgBoost classifier performs relatively well; classifiers' accuracy trained using DP data is close to that of original data 2) Round-trip execution performance of architecture shows high average mean and variance with higher privacy budgets.