Information Systems and Technology

Permanent URI for this collection

https://hdl.handle.net/10315/27588

Browse

Now showing 1 - 11 of 11

Open Access
A Hybrid Approach for Large-Scale Product Categorization Based on Weighted KNN and LSTM-BPV
(2019-12-04) Hu, Haohao; Huang, Xiangji
In modern e-commerce systems, large volumes of new items are being added to the product list everyday, which calls for automatic product categorization. In this thesis we propose a weighted K-Nearest Neighbour (KNN) based classification system for solving large-scale e-commerce product taxonomy classification problem. We use information retrieval (IR) model as similarity function in our weighted KNN algorithm. Among all IR models used in this study, we achieved highest classification performance through using information-based (IB) model as similarity function in the KNN algorithm. Moreover, our proposed method can improve the overall performance when combining prediction results with those from advanced neural network based method, namely Long Short-Term Memory with Balanced Pooling Views (LSTM-BPV). The hybrid system could achieve results comparable to the state of the art (SotA). We also get good results by fine-tuning pre-trained Bidirectional Encoder Representations from Transformers (BERT) model.
Open Access
Automatic Image Recognition of Rapid Malaria Emergency Diagnosis: A Deep Neural Network Approach
(2018-03-01) Liang, Zhaohui; Huang, Xiangji
Deep learning is the state-of-the-art artificial intelligence (AI) method for visual pattern detection and automated diagnosis. This paper describes the application of convolutional neural network (CNN), the deep learning model for visual recognition, to automatic detection of plasmodium parasitized red blood cells for malaria field screening and rapid diagnosis. The malaria thin blood smears are from Bangladesh and initially labeled by a specialist. 27,578 red blood cell images are segmented (raw set). The images are rotated clockwise three times to generate an augmented dataset with 110,312 red blood cell images. A 12-layer and an 18-layer CNN-based Malaria Net models are applied to classify both the raw data set and the augmented dataset. The performance is evaluated by ten-fold cross-validation and compared to a transfer learning model. In the ten-fold cross-validation test for Malaria Net, the average accuracy is 97.37% (18-layer) and 96.09% (12-layer) with the raw set, and is 97.93% and 96.75% with the augmented set, in comparison to 91.99% with the raw set and 94.26% with the augmented set in transfer learning. In addition, the two CNN models show superiority over transfer learning in all performance indicators such as sensitivity, specificity, precision, F1 score, and Matthews correlation coefficient. The Malaria Net can accurately detect malaria-infected red blood cells. A CNN model trained by domain-specific data shows superior performance over the transfer-learning method. Automatic image classification powered by deep learning offers not only an accurate method for the malaria field screening and rapid diagnosis but also a new solution for malaria control especially in resource-poor regions.
Open Access
Exploiting semantics for improving clinical information retrieval
(2016-06-23) Babashzadeh, Atanaz; Huang, Xiangji
Clinical information retrieval (IR) presents several challenges including terminology mismatch and granularity mismatch. One of the main objectives in clinical IR is to fill the semantic gap among the queries and documents and going beyond keywords matching. To address these issues, in this study we attempt to use semantic information to improve the performance of clinical IR systems by representing queries in an expressive and meaningful context. In this study we propose query context modeling to improve the effectiveness of clinical IR systems. To model query contexts we propose two novel approaches to modeling medical query contexts. The first approach concerns modeling medical query contexts based on mining semantic-based AR for improving clinical text retrieval. The query context is derived from the rules that cover the query and then weighted according to their semantic relatedness to the query concepts. In our second approach we model a representative query context by developing query domain ontology. To develop query domain ontology we extract all the concepts that have semantic relationship with the query concept(s) in UMLS ontologies. Query context represents concepts extracted from query domain ontology and weighted according to their semantic relatedness to the query concept(s). The query context is then exploited in the patient records query expansion and re-ranking for improving clinical retrieval performance. We evaluate this approach on the TREC Medical Records dataset. Results show that our proposed approach significantly improves the retrieval performance compare to classic keyword-based IR model.
Open Access
Improvement in Probabilistic Information Retrieval Model: Rewarding Terms with High Relative Term Frequency
(2016-11-25) Zhu, Runjie; Huang, Xiangji
In this thesis, I propose the relative term frequency to be integrated into traditional probabilistic models, in other words, I introduce a set of three influence functions with the application of relative term frequency to model and enhance the performance of the fundamental probabilistic weighting function, BM25. The study aims to exploit the properties of the combination of relative term frequency and BM25. The extensive experiments and analyses conducted in the thesis are based on six of the TREC official datasets, and the results presented have shown a significant improvement in the retrieval effectiveness. The information retrieval system adopted is built on the Okapi Basic Search System (BSS), which offers a reliable and effective packaged framework to exercise the experiments, and to yield an end-to-end retrieval workflow.
Open Access
Machine Learning Approach to Predict Treatment Outcome Using Shockwave Lithotripsy in Management of Urinary Stone
(2020-08-11) Moghisi, Reihaneh; Huang, Xiangji
In Ontario, shock wave lithotripsy (SWL) is a regionalized resource and St. Michaels Hospital is one of only three centers in the province offering this service. As such, many of the patients travel a great distance to receive this noninvasive treatment. Our objective is to implement ensemble learning technique to predict treatment outcome based on the patients demographic information and stone characteristics. In order to construct a rigorous machine learning model that can be confidently applied to assist in decision making process, we built our model based on the whole dataset of patients ages over 18 for the years from 1998 to 2016. Our objective is to build a classification model to predict treatment outcome using SWL prior to making any decision on treatment modality. The success or failure was based on having retreatment plan for the same patient within less than 90 days of initial treatment. We also compared six machine learning algorithms performance on dataset in terms of their accuracy using t-test with 95% confidence interval. In addition, we performed a retrospective comparison of three shock wave lithotripsies (SWL) that has been used in SMH during the past two decades in terms of their successfulness. Furthermore, we looked at changing trends over time in terms of stone size, location, and patient BMI, and site of origin, gender, age, etc.
Open Access
Measuring Short Text Semantic Similarity with Deep Learning Models
(2018-11-21) Ge, Jun; Huang, Xiangji
Natural language processing (NLP) is the ability of a computer program to understand human language as it is spoken, which is a subfield of artificial intelligence (AI). The development of NLP applications is challenging because computers traditionally require humans to speak" to them in a programming language that is precise, unambiguous and highly structured, or through a limited number of clearly enunciated voice commands. We study the use of deep learning models, the state-of-the-art artificial intelligence (AI) method, for the problem of measuring short text semantic similarity in NLP area. In particular, we propose a novel deep neural network architecture to identify semantic similarity for pairs of question sentence. In the proposed network, multiple channels of knowledge for pairs of question text can be utilized to improve the representation of text. Then a dense layer is used to learn a classifier for classifying duplicated question pairs. Through extensive experiments on the Quora test collection, our proposed approach has shown remarkable and significant improvement over strong baselines, which verifies the effectiveness of the deep models as well as the proposed deep multi-channel framework.
Open Access
Re-ranking Real-time Web Tweets to Find Reliable and Influential Twitterers
(2014-07-09) Al Sinan, Ahmed Husain; Huang, Xiangji
Twitter is a powerful social media tool to share information on different topics around the world. Following different users/accounts is the most effective way to get information propagated in Twitter. Due to Twitter's limited searching and lack of navigation support, searching Twitter is not easy and requires effort to find reliable information. This thesis proposed a new methodology to rank tweets based on their authority with the goal of aiding users identifying influential Twitterers. This methodology, HIRKM rank, is influenced by PageRank, Alexa Rank, original tweet or a retweet and the use of hash tags to determine the authorisation of each tweet. This method is applied to rank TREC 2011 microblogging dataset which contains over 16 million tweets based on 50 predefined topics. The results are a list of tweets presented in a descending order based on their authorities which are relevant to the users search queries and will be evaluated using TREC’s official golden standard for the microblogging dataset.
Open Access
Rewarding the Location of Terms in Sentences to Enhance Probabilistic Information Retrieval
(2017-07-27) Liu, Baiyan; Huang, Xiangji
In most traditional retrieval models, the weight (or probability) of a query term is estimated based on its own distribution or statistics. Intuitively, however, the nouns are more important in information retrieval and are more often found near the beginning and the end of sentences. In this thesis, we investigate the effect of rewarding the terms based on their location in sentences on information retrieval. Particularly, we propose a kernel-based method to capture the term placement pattern, in which a novel Term Location retrieval model is derived in combination with the BM25 model to enhance probabilistic information retrieval. Experiments on five TREC datasets of varied size and content indicates that the proposed model significantly outperforms the optimized BM25 and DirichletLM in MAP over all datasets with all kernel functions, and excels compared to the optimized BM25 and DirichletLM over most of the datasets in P@5 and P@20 with different kernel functions.
Open Access
The Influence of Demographic Factors on the Cybersecurity Awareness Level of Individuals in an Academic Environment
(2019-11-22) Vazquez, Juan Carlos Barrera; Huang, Xiangji
Nowadays the notion of cybersecurity has claimed center stage in the daily life of individuals and organizations. Losses incurred to cyber-attacks are the result of faulty human interactions with new information and communication technologies (ICTs) in the context of cyberspace. The fast pace of technology discoveries has surpassed the understating of most ICT users. Consequently, individuals become unaware of such changes in different ways. This research examines differences and/or relationships in awareness level of individuals towards cybersecurity issues, considering four basic demographic factors: Gender, Age, Education, and Employment. The data set for this study originated from university students pursuing a bachelors degree in information systems and/or information technology. Finally, the results from this study are not conclusive and cannot be generalized due to several natural research limitations. However, several observations found in this study may contribute to the general body of knowledge for cybersecurity, and to stimulate future research.
Open Access
Using Learning to Rank Approach to Promoting Diversity for Biomedical Information Retrieval with Wikipedia
(2014-07-28) Wu, Jiajin; Huang, Xiangji
In most of the traditional information retrieval (IR) models, the independent relevance assumption is taken, which assumes the relevance of a document is independent of other documents. However, the pitfall of this is the high redundancy and low diversity of retrieval result. This has been seen in many scenarios, especially in biomedical IR, where the information need of one query may refer to different aspects. Promoting diversity in IR takes the relationship between documents into account. Unlike previous studies, we tackle this problem in the learning to rank perspective. The main challenges are how to find salient features for biomedical data and how to integrate dynamic features into the ranking model. To address these challenges, Wikipedia is used to detect topics of documents for generating diversity biased features. A combined model is proposed and studied to learn a diversified ranking result. Experiment results show the proposed method outperforms baseline models.
Open Access
Using Semantic-Based User Profile Modeling for Context-Aware Personalised Place Recommendations
(2015-08-28) Yalamarti, Sushma; Huang, Xiangji
Place Recommendation Systems (PRS's) are used to recommend places to visit to World Wide Web users. Existing PRS's are still limited by several problems, some of which are the problem of recommending similar set of places to different users (Lack of Personalization) and no diversity in the set of recommended items (Content Overspecialization). One of the main objectives in the PRS's or Contextual suggestion systems is to fill the semantic gap among the queries and suggestions and going beyond keywords matching. To address these issues, in this study we attempt to build a personalized context-aware place recommender system using semantic-based user profile modeling to address the limitations of current user profile building techniques and to improve the retrieval performance of personalized place recommender system. This approach consists of building a place ontology based on the Open Directory Project (ODP), a hierarchical ontology scheme for organizing websites. We model a semantic user profile from the place concepts extracted from place ontology and weighted according to their semantic relatedness to user interests. The semantic user profile is then exploited to devise a personalized recommendation by re-ranking process of initial search results for improving retrieval performance. We evaluate this approach on dataset obtained using Google Paces API. Results show that our proposed approach significantly improves the retrieval performance compare to classic keyword-based place recommendation model.

Browse

Browsing Information Systems and Technology by Author "Huang, Xiangji"

Results Per Page

Sort Options