YorkSpace has migrated to a new version of its software. Access our Help Resources to learn how to use the refreshed site. Contact diginit@yorku.ca if you have any questions about the migration.
 

Information Systems and Technology

Permanent URI for this collection

Browse

Recent Submissions

Now showing 1 - 20 of 47
  • ItemOpen Access
    Exploratory Analysis of Water Quality in a Small Urbanized Watershed Using Deep Learning
    (2023-12-08) Ofosu, Alfred; Erechtchoukova, Marina G.
    Water is a life-sustaining resource for living organisms inside and outside water bodies. Natural waters serve as municipal and industrial water supplies, sources for agricultural irrigation, homes for aquatic ecosystems, recreation, and other essential uses. The quality of water determines its use. Therefore, it must be monitored, managed, and reported to help stakeholders in decision-making that can protect watershed ecosystems and improve measures to mitigate factors adversely affecting water bodies. Water quality is represented by a set of parameters that describe specific characteristics or properties of water. These parameters are determined by measuring water's physical and chemical characteristics and concentration levels of various substances in a water column with subsequent sample analysis in laboratories. This results in low frequencies of observations for water quality parameters compared to hydrometric and meteorological data. Frequencies of observation adopted by many water quality monitoring systems vary between 4 and 12 samples per year, suggesting applying modelling techniques to support decision-making. The study aims to develop a data-driven computational tool for water quality modelling in a small, highly urbanized watershed of the Don River, Ontario, Canada. The study focuses on major ions, namely, cations: calcium (Ca2+), magnesium (Mg2+), sodium (Na+), and potassium (K+), and anions such as bicarbonate (HCO3-), carbonate (CO32-), chloride (Cl-), and sulphate (SO42-). These parameters are not affected significantly by the aquatic ecosystem. The hydrological and meteorological processes mainly determine their dynamics. The study uses data from different monitoring systems belonging to the Toronto and Region Conservation Authority (TRCA) and Environment and Climate Change Canada (ECCC). It consists of water quality parameters and hydrometric and meteorological characteristics observed in the watershed over 57 years. Concentrations of selected water quality parameters are modelled using deep neural networks. The data pre-processing framework for cleansing and integrating data observed at different frequencies from different locations is developed. The framework is applied for the comparative analysis of neural networks of various configurations. Two sets of computational experiments were conducted. In the first set of experiments, integrated data from all monitoring stations in the watershed was fed into the deep learning algorithms to train a neural network to predict the concentration of major ions for the upcoming month (t+1). The second set of experiments uses upstream environmental parameters to train the model and predict the major ion concentrations in the lower subwatershed. The study investigates the performance of developed models in accurately predicting ion concentrations and provides insights into the relationship between environmental factors and water quality in the investigated watershed. The findings have practical applications for water resource management and pollution prevention efforts.
  • ItemOpen Access
    Enhancing General Language Models for Biomedical Test Retrieval via Diversified Prior Knowledge
    (2023-12-08) Huang, Yizheng; Huang, Jimmy
    The thesis introduces the Diversified Prior Knowledge Enhanced General Language Model (DPK-GLM) to improve the efficacy of general language models in biomedical Information Retrieval (IR). General language models often struggle with biomedical data due to its specialized terminology and the need for precise matching. DPK-GLM tackles these challenges by integrating domain-specific knowledge, thereby enhancing the model's ability to understand and process biomedical information. The framework comprises three core components. The first, Knowledge-based Query Expansion, leverages authoritative biomedical databases to enrich search queries with domain-specific entities. The second, Aspect-based Filter, identifies documents that are highly relevant to the query. The third, Diversity-based Score Reweighting, re-ranks these filtered documents by combining similarity and diversity scores, yielding more accurate results. Experimental tests on public biomedical IR datasets confirm that DPK-GLM significantly improves retrieval performance.
  • ItemOpen Access
    Using Data Analytics and Machine Learning in Sustainable Forest Management from Remote Sensing Data
    (2023-08-04) Sysoeva, Polina; Khaiter, Peter A.
    Nowadays, remote sensing has become a widely used technique to acquire data for ecosystem service assessment (ESA) and other sustainable management practices. Remotely Sensed Data (RSD) is particularly crucial in locations where in situ observations are either limited or completely impossible due to their inaccessibility, such as mountainous areas. However, due to the unique features of the RSD, obtaining substantial insights requires specific preprocessing steps and strong computational algorithms, such as machine learning (ML). In the research, we present a methodology integrating RSD with data analytic and machine learning techniques for the needs of ESA. A pipeline for preprocessing EOS data, transforming into features, and experimenting with tuning of the ML algorithms is developed. A practical application of the proposed approach is demonstrated through assessing the impact of extreme weather events on forest ecosystems and their carbon sequestration abilities in two areas of the Kashmir Valley, Jammu & Kashmir, India.
  • ItemOpen Access
    Comparative Analysis of Transformer-Based Language Models for Text Analysis in the Domain of Sustainable Development
    (2023-08-04) Safwat, Nabil; Erechtchoukova, Marina G.
    With advancements of Artificial Intelligence, Natural Language Processing (NLP) has gained a lot of attention because of its potential to facilitate complex human-machine interactions, enhance language-based applications, and automate processing of unstructured texts. The study investigates the transfer learning approach on Transformer-based Language models, abstractive text summarization approach, and their application to the domain of Sustainable Development with the goal to determine SDGs representation in scientific publications using the text summarization technique. To achieve this, the traditional transfer learning framework was expanded so that: (1) the relevance of textual documents to specified text can be evaluated, (2) neural language models, namely BART and T5, were selected, and (3) 8 text similarity measures were investigated to identify the most informative ones. Both the BART and T5 models were fine-tuned on an acquired domain-specific corpus of scientific publications extracted from Scopus Elsevier database. The relevance of recently published works to an SDG was determined by calculating semantic similarity scores between each model generated summary to the SDG’s description. The proposed framework made it possible to identify goals that dominated the developed corpus and those that require further attention of the research community.
  • ItemOpen Access
    Dynamic Elastic Provisioning For NFV-Enabled 5G Networks Using Machine Learning
    (2023-03-28) Ali, Khalid; Jammal, Manar
    5G networks are expected to support a variety of services and applications by having a more stringent latency, reliability, and bandwidth requirements compared to previous generations. To meet these requirements, Open Radio Access Networks (O-RAN) has been proposed. The O-RAN Alliance assumes O-RAN components to be Virtualized Network Functions (VNFs). Furthermore, O-RAN allows employing Machine Learning (ML) solutions to tackle challenges in resource management. However, intelligently managing resources for O-RAN can prove challenging. Network providers need to dynamically scale resources in response to incoming traffic. Elastically allocating resources provides higher flexibility, reduces OPerational EXpenditure (OPEX), and increases resource utilization. In this work, we propose and evaluate an elastic VNF orchestration framework for O-RAN. The proposed system consists of a traffic forecasting-based dynamic scaling scheme using ML, and a Reinforcement Learning (RL) based VNF placement policy. The models are evaluated based on their predictive capabilities subject to all Service-Level Agreements.
  • ItemOpen Access
    Integrating Precipitation Nowcasting in a Deep Learning-Based Flash Flood Prediction Framework and Assessing the Impact of Rainfall Forecasts Uncertainties
    (2022-12-14) Mhedhbi, Rim; Erechtchoukova, Marina G.
    Flash floods are among the most immediate and destructive natural hazards. To issue warnings on time, various attempts were made to extend the forecast horizon of flash floods prediction models. Particularly, introducing rainfall forecast into process-based hydrological models was found effective. However, integrating precipitation predictions into flash flood data-driven models has not been addressed yet. In this endeavor, we propose a modeling framework that integrates rainfall nowcasts and assesses the impact of rainfall predictions uncertainties on a Deep Learning-based flash flood prediction model. Compared to the Persistence and ARIMA models, the LSTM model provided better rainfall nowcasting performance. Further, we proposed an Encoder-Decoder LSTM-based model architecture for short-term flash flood prediction that supports rainfall forecasts. Computational experiments showed that future rainfall values improved flash floods’ predictability for extended lead times. We also found that rainfall underestimation had a significant adverse effect on the model’s performance compared to rainfall overestimation.
  • ItemOpen Access
    Design Approach for Building Technology with Indigenous Communities
    (2022-08-08) Rizvi, Alina; Chen, Stephen
    An increase in demand for mobile platforms in the last decade has led to a widespread need for platform development methods. While these standards do work well for a majority of mobile developers, one audience that can be neglected is the urban Indigenous population of youth in Toronto. Through experience, relationships and an understanding of significant cultural practices and teachings, this study proposes a unique mobile development approach. This approach is tailored specifically towards urban Indigenous youth in Toronto, incorporating the Anishinaabe Medicine Wheel, 7 Grandparent Teachings, and Sharing Circles as main influencers. It also features an experience report of how the mobile development approach worked in practice. Two mobile platforms were built using this approach and achieved successful results, with both becoming popular applications within their respective target audiences. This approach places a focus on the users and essentially aims to have the target audience be the main deciding factor in how the developed platform looks and functions. The motivation behind this study is to make technology less exclusive, and more accessible to a diverse population.
  • ItemOpen Access
    Implementing Security Requirements through Automatic Generation of Secure Workflows
    (2022-08-08) Jaouhar, Ibrahim; Liaskos, Sotirios
    Modern software-intensive information systems are enormously large and complex. Prior to the design process of such systems, designers and architects need to know what kinds of stakeholder needs the system is supposed to support. This is particularly true for security requirements which must be captured and analyzed alongside all other requirements rather than treated as an afterthought. Hence, many researchers have proposed different modelling frameworks in different domain fields to address security and privacy patterns. However, most of these frameworks focus on comprehensive representation and analysis of requirements, without indicating how such requirements can be implemented within the context of a business process. Users are often at loss with regards to what security technologies they should adopt and incorporate in their workflows to reach secure business processes. In this thesis, we propose a framework for enriching goal-oriented requirements models with security controls necessitated by specified security requirements. A set of patterns are designed by security experts that associate abstract domain-independent user goals/tasks with alternative workflows that achieve those goals with various levels of security. Such translation of information is performed with the aid of an AI planner, SHOP2. Consequently, system analysts with no deep experience in security technologies can acquire a view of what steps and technologies are involved in making their designs more secure and implement accordingly.
  • ItemOpen Access
    Data Analytics in Climate Change Studies
    (2022-08-08) Bhardwaj, Eshta; Khaiter, Peter A.
    The observed trends of climate change have wide ramifications to the sustainability of a normal lifestyle. The application of data analysis techniques can increase the knowledge around climate studies while introducing additional research methods. The proposed research showcases the development of a novel data analytics framework to address the gap in data modeling for the comparison of climate model data and observational data. A detailed data pipeline for data collection, extraction, wrangling, analysis, and visualization is discussed. The practical implementation of the framework is presented through a visualization tool, Weather Analysis Regional Model, to aid researchers and practitioners in their analysis of climate data.
  • ItemOpen Access
    Exploring the Effect of User Characteristics in Word Cloud Visualizations
    (2022-08-08) Shirin, Zehra; Hoque Prince, Enamul
    Word clouds are very popular for visually summarizing texts. While word clouds usually show the frequency of words using font size, recent studies have explored other possible design elements. However, there is still a gap in terms of understanding how individual differences among users may impact their performance. This thesis aims to bridge this gap by answering two key research questions: What user characteristics are impacted by different variations of word clouds, and how word clouds could be adapted to different user characteristics. To answer these questions, we ran a user study where participants performed perceptual speed and verbal working memory tests followed by 36 trials of the magnitude judgement task for word clouds. Results showed that user characteristics like perceptual speed can significantly impact the performance of users. These results can be useful in the future to provide personalized word clouds that are suitable for people with different user characteristics.
  • ItemOpen Access
    Using Data Mining Techniques to Assess the Impact of COVID-19 on the Auto Insurance Industry in China
    (2022-03-03) Wang, Jiangshan; Zhu, Huaiping
    Since coronavirus disease 2019 (COVID-19) was discovered at the end of 2019, the whole world has been severely affected. The insurance industry, regarded as an important factor in recovery, has also been affected by COVID-19. However, effective data mining techniques have rarely been utilized in the insurance industry in China, especially under the circumstances of COVID-19. Although some traditional statistical analysis methods have been applied to this area, the limitation of the lack of data distribution still cannot be efficiently overcome. With the machine learning technique proposed in this thesis, this limitation can be solved by using a stacking model with great generalization ability. In this research, the ElasticNet, LightGBM, and Random Forest approaches were employed as base learners; ridge and LASSO regression were used as meta-models to increase the prediction accuracy; and the SHAP value was utilized to explain the impact of COVID-19 on the insurance industry in China. The stacking meta-model in this thesis has a mean absolute percentage error (MAPE) of 12.57134, whereas the average value in the past week is 21.50972, and the MAPE of ElasticNet is 22.57935. In conclusion, COVID-19 affects the auto insurance industry in China.
  • ItemOpen Access
    Factorized Construction of Machine Learning Methods over Normalized Data
    (2021-11-15) Zhang, Zhe; Yu, Xiaohui
    Enterprises are adopting machine learning to gain knowledge from the vast amount of data, which are normalized and stored in relational databases. All the features required in different relations must be combined through join operations and fed to machine learning processes. As a result, redundancy avoided by normalization is reintroduced, which incurs additional costs. This thesis proposes the factorized algorithms (F-GMM, F-NN and F-PPCA) for three widely used scenarios (GMM, NN and PPCA) in machine learning to eliminate the redundancy introduced by the joins. The training process can be conducted much faster without any loss in accuracy for the exact decomposition. The efficiency improvement depends on the relative redundancy of the original relations. Finally, we design extensive experiments on both synthetic and real datasets to evaluate the performance of the proposed algorithms by varying parameters of interest. The factorized method yields significant efficiency improvements, which increases with redundancy growth.
  • ItemOpen Access
    Exploring Topic Modeling in The Domain of Integrated Water Resource Management
    (2021-11-15) Kohli, Akshay Kumar; Erechtchoukova, Marina G.
    To successfully achieve the United Nations Sustainable Development goals, policy and decision making should include Integrated Environmental Assessment (IEA). Water resources and there utilization play an important role in achieving these goals at all levels from global to local. Sustainability of a water resource is of paramount importance for achieving United Nations long-term development goals. Sustainability of a resource is governed by the interplay of inner natural processes, biological, economical and social systems, making management of a water resource a complex multidisciplinary problem which can be solved only by combining various approaches. The thesis explored application of text mining techniques, namely, topic modelling, to scientific publications in the sustainable water resource management domain with the goal to identify major research questions, practical problems and methodological approaches used to address these problems. Comparative analysis of approaches to building corpora and model performance evaluations were conducted.
  • ItemOpen Access
    Privacy-Preserving Edge-Cloud Architecture for IoT Healthcare Systems
    (2021-11-15) Goyal, Payal; Litoiu, Marin
    With the surging demand for Internet of Things (IoT) healthcare applications, a myriad of data privacy concerns come to light. Cloud computing inherits the risks of exposing data to re-identification vulnerabilities. A secure solution is storing and processing data locally on edge, but it lacks the provision of powerful machine learning (ML) needs. An improved computing framework is required to incorporate ML capabilities and user-data confidentiality. We perform a systematic study of IoT healthcare systems and propose a three-tier architecture that protects and enables data sharing. The edge anonymizes data using differential privacy (DP); transmits it to the cloud to train ML classifier; sent back trained classifier to edge to make inferences. Our findings show 1) XgBoost classifier performs relatively well; classifiers' accuracy trained using DP data is close to that of original data 2) Round-trip execution performance of architecture shows high average mean and variance with higher privacy budgets.
  • ItemOpen Access
    Experimental analysis on the operation of Particle Swarm Optimization
    (2021-07-06) Yadollahpour, Naeemeh; Chen, Stephen
    In Particle Swarm Optimization, it has been observed that swarms often stall as opposed to converge. A stall occurs when all of the forward progress that could occur is instead rejected as Failed Exploration. Since the swarms particles are in good regions of the search space with the potential to make more progress, the introduction of perturbations to the pbest positions can lead to significant improvements in the performance of standard Particle Swarm Optimization. The pbest perturbation has been supported by a line search technique that can identify unimodal, globally convex, and non-globally convex search spaces, as well as the approximate size of attraction basin. A deeper analysis of the stall condition reveals that it involves clusters of particles that are performing exploitation, and these clusters are separated by individual particles that are performing exploration. This stall pattern can be identified by a newly developed method that is efficient, accurate, real-time, and search space independent. A more targeted (heterogenous) modification for stall is presented for globally convex search spaces.
  • ItemOpen Access
    A Natural Language Question Answering System for Exploring Online Conversations
    (2021-03-08) Siddiqui, Nadia Ashfaq; Prince, Enamul Hoque
    The proliferation of social media has resulted in the exponential growth of on- line conversations. Due to the volume and complexity of conversations, it is often extremely difficult to gain insights from such conversations. This dissertation hy-pothesizes that synergetic integration of natural language processing with informa-tion visualization techniques can help users to better fulfill their information needs. More specifically, we developed a question-answering method that allows the user to ask questions about a conversation and then automatically answers the question by highlighting results in a visual interface. The visual interface, named ConVisQA, was developed by extending ConVis which visually summarizes a conversation by providing an overview of topics and sentiment information. We demonstrate the effectiveness of our approach through a user study with blog readers. The dis-sertation concludes with a user study comparing our interface with a traditional interface for blog reading as well as considerations for future work.
  • ItemOpen Access
    PrivateMe: Managing Privacy in Multiple Applications and Devices
    (2021-03-08) Huber, Christianne; Litoiu, Marin
    Applications that tailor information to the user rely on data being collected and communicated. This may lead to privacy concerns about personal data and how it can be used. Even when privacy controls are available, it is not always clear which settings control the data collection. Furthermore, with the volume of data that is being collected, it is not always obvious how the data is collected. Other researchers have proposed solutions to assist the user to manage privacy. Yet there is a need for a solution that will support privacy management on different devices, for multiple applications and web services. PrivateMe includes a privacy goal model to capture privacy goals, a generic taxonomy of privacy settings and permissions, and an ontology to store reusable knowledge about how the privacy settings and permissions interact. PrivateMe is evaluated with three use cases that show its applicability to managing privacy on multiple devices.
  • ItemOpen Access
    How Software Transparency Can Mitigate Conflicts Among Different Stakeholders in the Animal Experimentation Domain
    (2020-08-11) Chen, Ren-Luen; Cysneiros, Luiz Marcio
    The arguments of whether animals should be used in the experiments have existed for decades. Stakeholders such as animal advocates, scientists, and mediators have been calling for more transparency to tackle the conflicts. They all claim that being transparent is a way to understand each other and to know the whole picture of the animal experiments. It is believed that laboratories software should provide aspects of transparency to help mitigate the conflicts among different stakeholders points of view. In this thesis, a Systematic Literature Review was conducted to collect requirements and potential solutions from the literature from the perspectives of different stakeholders and put them together in a set of softgoal interdependency graphs (SIGs) that illustrating the possible solutions to achieve transparency. The resulted SIGs may help the laboratories to adopt software that provides a level of transparency for the research process, and it will also help to mitigate current problems involving researchers, mediators, and groups contrary to the use of animals.
  • ItemOpen Access
    Machine Learning Approach to Predict Treatment Outcome Using Shockwave Lithotripsy in Management of Urinary Stone
    (2020-08-11) Moghisi, Reihaneh; Huang, Xiangji
    In Ontario, shock wave lithotripsy (SWL) is a regionalized resource and St. Michaels Hospital is one of only three centers in the province offering this service. As such, many of the patients travel a great distance to receive this noninvasive treatment. Our objective is to implement ensemble learning technique to predict treatment outcome based on the patients demographic information and stone characteristics. In order to construct a rigorous machine learning model that can be confidently applied to assist in decision making process, we built our model based on the whole dataset of patients ages over 18 for the years from 1998 to 2016. Our objective is to build a classification model to predict treatment outcome using SWL prior to making any decision on treatment modality. The success or failure was based on having retreatment plan for the same patient within less than 90 days of initial treatment. We also compared six machine learning algorithms performance on dataset in terms of their accuracy using t-test with 95% confidence interval. In addition, we performed a retrospective comparison of three shock wave lithotripsies (SWL) that has been used in SMH during the past two decades in terms of their successfulness. Furthermore, we looked at changing trends over time in terms of stone size, location, and patient BMI, and site of origin, gender, age, etc.
  • ItemOpen Access
    Comparing Representations of Contribution Labels in Goal Models
    (2020-05-11) Tambosi, Wisal Yousef S.; Liaskos, Sotirios
    Goal models have been proposed to be an effective method to support decision making in early requirements engineering. Key to using them is the concept of contribution links that represent how the satisfaction of one goal affects that of another. Multiple proposals have been offered for representing contribution; however, the degree to which users can intuitively understand the meaning behind contribution representations and utilize them appropriately has not been thoroughly studied. This work reports the results of an experimental study that compares the intuitiveness of two contribution representation approaches by measuring the performance of untrained users and exploring the role of individual differences (cognitive styles and arithmetic attitude and ability) in establishing the right intuition. Results show significant differences between the two representations as well as effects of various levels of individual factors. The results inspire further research on contribution links and support the operationalizability of intuitiveness as a criterion for evaluating conceptual modelling language designs.