Information Systems and Technology

Permanent URI for this collection

https://hdl.handle.net/10315/27588

Browse

Now showing 1 - 20 of 62

Open Access
A Time-Aware Approach to Improving Ad-hoc Information Retrieval from Microblogs
(2014-07-09) Amin Nayeri, Zahra; Huang, Xiangji (Jimmy)
There is an immense number of short-text documents produced as the result of microblogging. The content produced is growing as the number of microbloggers grows, and as active microbloggers continue to post millions of updates. The range of topics discussed is so vast, that microblogs provide an abundance of useful information. In this work, the problem of retrieving the most relevant information in microblogs is addressed. Interesting temporal patterns were found in the initial analysis of the study. Therefore the focus of the current work is to first exploit a temporal variable in order to see how effectively it can be used to predict the relevance of the tweets and, then, to include it in a retrieval weighting model along with other tweet-specific features. Generalized Linear Mixed-effect Models (GLMMs) are used to analyze the features and to propose two re-ranking models. These two models were developed through an exploratory process on a training set and then were evaluated on a test set.
Open Access
Adaptive Mechanisms for Mobile Spatio-Temporal Applications
(2014-07-09) Theodorou, Vasileios; Litoiu, Marin
Mobile spatio-temporal applications play a key role in many mission critical fields, including Business Intelligence, Traffic Management and Disaster Management. They are characterized by high data volume, velocity and large and variable number of mobile users. The design and implementation of these applications should not only consider this variablility, but also support other quality requirements such as performance and cost. In this thesis we propose an architecture for mobile spatio-temporal applications, which enables multiple angles of adaptivity. We also introduce a two-level adaptation mechanism that ensures system performance while facilitating scalability and context-aware adaptivity. We validate the architecture and adaptation mechanisms by implementing a road quality assessment mobile application as a use case and by performing a series of experiments on cloud environment. We show that our proposed architecture can adapt at runtime and maintain service level objectives while offering cost-efficiency and robustness.
Open Access
Re-ranking Real-time Web Tweets to Find Reliable and Influential Twitterers
(2014-07-09) Al Sinan, Ahmed Husain; Huang, Xiangji
Twitter is a powerful social media tool to share information on different topics around the world. Following different users/accounts is the most effective way to get information propagated in Twitter. Due to Twitter's limited searching and lack of navigation support, searching Twitter is not easy and requires effort to find reliable information. This thesis proposed a new methodology to rank tweets based on their authority with the goal of aiding users identifying influential Twitterers. This methodology, HIRKM rank, is influenced by PageRank, Alexa Rank, original tweet or a retweet and the use of hash tags to determine the authorisation of each tweet. This method is applied to rank TREC 2011 microblogging dataset which contains over 16 million tweets based on 50 predefined topics. The results are a list of tweets presented in a descending order based on their authorities which are relevant to the users search queries and will be evaluated using TREC’s official golden standard for the microblogging dataset.
Open Access
Using Learning to Rank Approach to Promoting Diversity for Biomedical Information Retrieval with Wikipedia
(2014-07-28) Wu, Jiajin; Huang, Xiangji
In most of the traditional information retrieval (IR) models, the independent relevance assumption is taken, which assumes the relevance of a document is independent of other documents. However, the pitfall of this is the high redundancy and low diversity of retrieval result. This has been seen in many scenarios, especially in biomedical IR, where the information need of one query may refer to different aspects. Promoting diversity in IR takes the relationship between documents into account. Unlike previous studies, we tackle this problem in the learning to rank perspective. The main challenges are how to find salient features for biomedical data and how to integrate dynamic features into the ranking model. To address these challenges, Wikipedia is used to detect topics of documents for generating diversity biased features. A combined model is proposed and studied to learn a diversified ranking result. Experiment results show the proposed method outperforms baseline models.
Open Access
A Methodology for Eliciting and Ranking Control Points for Adaptive Systems
(2014-07-28) Zoghi, Parisa; Litoiu, Marin
Designing an adaptive system to meet its quality constraints in the face of environmental uncertainties, such as variable demands, can be a challenging task. In cloud environment, a designer has to also consider and evaluate different control points, i.e., those variables that affect the quality of the software system. This thesis presents a method for eliciting, evaluating and ranking control points for web applications deployed in cloud environments. The proposed method consists of several phases that take a high-level stakeholders' adaptation goal and transform it into lower level MAPE-K loop control points. The MAPE-K loop is then activated at runtime using an adaptation algorithm. We conducted several experiments to evaluate the different phases of the methodology and we report the results and the lesson learnt.
Open Access
Automating Software Customization via Crowdsourcing using Association Rule Mining and Markov Decision Processes
(2015-01-26) Hamidi, Saeideh; Liaskos, Sotirios
As systems grow in size and complexity so do their configuration possibilities. Users of modern systems are easy to be confused and overwhelmed by the amount of choices they need to make in order to fit their systems to their exact needs. In this thesis, we propose a technique to select what information to elicit from the user so that the system can recommend the maximum number of personalized configuration items. Our method is based on constructing configuration elicitation dialogs through utilizing crowd wisdom. A set of configuration preferences in form of association rules is first mined from a crowd configuration data set. Possible configuration elicitation dialogs are then modeled through a Markov Decision Processes (MDPs). Within the model, association rules are used to automatically infer configuration decisions based on knowledge already elicited earlier in the dialog. This way, an MDP solver can search for elicitation strategies which maximize the expected amount of automated decisions, reducing thereby elicitation effort and increasing user confidence of the result. We conclude by reporting results of a case study in which this method is applied to the privacy configuration of Facebook.
Open Access
An Approach to Designing Clusters for Large Data Processing
(2015-08-28) Sandel, Roni; Litoiu, Marin
Cloud computing is increasingly being adopted due to its cost savings and abilities to scale. As data continues to grow rapidly, an increasing amount of institutions are adopting non standard SQL clusters to address the storage and processing demands of large data. However, evaluating and modelling non SQL clusters presents many challenges. In order to address some of these challenges, this thesis proposes a methodology for designing and modelling large scale processing configurations that respond to the end user requirements. Firstly, goals are established for the big data cluster. In this thesis, we use performance and cost as our goals. Secondly, the data is transformed from relational data schema to an appropriate HBase schema. In the third step, we iteratively deploy different clusters. We then model the clusters and evaluate different topologies (size of instances, number of instances, number of clusters, etc.). We use HBase as the large data processing cluster and we evaluate our methodology on traffic data from a large city and on a distributed community cloud infrastructure.
Open Access
An Empirical Study on the Role of Requirement Engineering in Agile Method and Its Impact on Quality
(2015-08-28) Rahman, Anzira; Cysneiros, Luiz Marcio
Agile Methods are characterized as flexible and easily adaptable. The need to keep up with multiple high-priority projects and shorter time-to-market demands could explain their increasing popularity. It also raises concerns of whether or not use of these methods jeopardizes quality. Since Agile methods allow for changes throughout the process, they also create probabilities to impact software quality at any time. This thesis examines the process of requirement engineering as performed with Agile method in terms of its similarities and differences to requirement engineering as performed with the more traditional Waterfall method. It compares both approaches from a software quality perspective using a case study of 16 software projects. The main contribution of this work is to bring empirical evidence from real life cases that illustrate how Agile methods significantly impacts software quality, including the potential for a larger number of defects due to poor non-functional requirements elicitation.
Open Access
Using Semantic-Based User Profile Modeling for Context-Aware Personalised Place Recommendations
(2015-08-28) Yalamarti, Sushma; Huang, Xiangji
Place Recommendation Systems (PRS's) are used to recommend places to visit to World Wide Web users. Existing PRS's are still limited by several problems, some of which are the problem of recommending similar set of places to different users (Lack of Personalization) and no diversity in the set of recommended items (Content Overspecialization). One of the main objectives in the PRS's or Contextual suggestion systems is to fill the semantic gap among the queries and suggestions and going beyond keywords matching. To address these issues, in this study we attempt to build a personalized context-aware place recommender system using semantic-based user profile modeling to address the limitations of current user profile building techniques and to improve the retrieval performance of personalized place recommender system. This approach consists of building a place ontology based on the Open Directory Project (ODP), a hierarchical ontology scheme for organizing websites. We model a semantic user profile from the place concepts extracted from place ontology and weighted according to their semantic relatedness to user interests. The semantic user profile is then exploited to devise a personalized recommendation by re-ranking process of initial search results for improving retrieval performance. We evaluate this approach on dataset obtained using Google Paces API. Results show that our proposed approach significantly improves the retrieval performance compare to classic keyword-based place recommendation model.
Open Access
A Methodology for Eliciting and Ranking Control Points for Adaptive Systems
(2015-08-28) Zoghi, Parisa; Litoiu, Marin
Designing an adaptive system to meet its quality constraints in the face of environmental uncertainties, such as variable demands, can be a challenging task. In cloud environment, a designer has to also consider and evaluate different control points, i.e., those variables that affect the quality of the software system. This thesis presents a method for eliciting, evaluating and ranking control points for web applications deployed in cloud environments. The proposed method consists of several phases that take a high-level stakeholders' adaptation goal and transform it into lower level MAPE-K loop control points. The MAPE-K loop is then activated at runtime using an adaptation algorithm. We conducted several experiments to evaluate the different phases of the methodology and we report the results and the lesson learnt.
Open Access
Efficient Calculation of Optimal Configuration Processes
(2015-12-16) Fernandez, Yasser Gonzalez; Chen, Stephen; Liaskos, Sotirios
Customers are getting increasingly involved in the design of the products and services they choose by specifying their desired characteristics. As a result, configuration systems have become essential technologies to support the development of mass-customization business models. These technologies facilitate the configuration of complex products and services that otherwise could generate many incorrect configurations and overwhelm users with confusion. This thesis studies the problem of optimizing the user interaction in a configuration process – as in minimizing the number of questions asked to a user in order to obtain a fully-specified product or service configuration. The work carried out builds upon a previously existing framework to optimize the process of configuring a software system, and focuses on improving its efficiency and generalizing its application to a wider range of configuration domains. Two solution methods along with two alternative ways of specifying the configuration models are proposed and studied on different configuration scenarios. The experimental study evidences that the introduced solutions overcome the limitations of the existing framework, resulting in more suitable algorithms to work with models involving a large number of configuration variables.
Open Access
Exploiting semantics for improving clinical information retrieval
(2016-06-23) Babashzadeh, Atanaz; Huang, Xiangji
Clinical information retrieval (IR) presents several challenges including terminology mismatch and granularity mismatch. One of the main objectives in clinical IR is to fill the semantic gap among the queries and documents and going beyond keywords matching. To address these issues, in this study we attempt to use semantic information to improve the performance of clinical IR systems by representing queries in an expressive and meaningful context. In this study we propose query context modeling to improve the effectiveness of clinical IR systems. To model query contexts we propose two novel approaches to modeling medical query contexts. The first approach concerns modeling medical query contexts based on mining semantic-based AR for improving clinical text retrieval. The query context is derived from the rules that cover the query and then weighted according to their semantic relatedness to the query concepts. In our second approach we model a representative query context by developing query domain ontology. To develop query domain ontology we extract all the concepts that have semantic relationship with the query concept(s) in UMLS ontologies. Query context represents concepts extracted from query domain ontology and weighted according to their semantic relatedness to the query concept(s). The query context is then exploited in the patient records query expansion and re-ranking for improving clinical retrieval performance. We evaluate this approach on the TREC Medical Records dataset. Results show that our proposed approach significantly improves the retrieval performance compare to classic keyword-based IR model.
Open Access
Toward Autonomic Data-Oriented Scalability in Cloud Computing Environments
(2016-09-20) Zareian, Saeed; Litoiu, Marin
The applications deployed in modern data centers are highly diverse in terms of architecture and performance needs. It is a challenge to provide consistent services to all applications in a shared environment. This thesis proposes a generic analytical engine that can optimize the use of cloud-based resources according to service needs in an autonomic manner. The proposed system is capable of ingesting large amounts of data generated by various monitoring services within data centers. Then, by transforming that data into actionable knowledge, the system can make the necessary decisions to maintain a desired level of quality of service. The contributions of this work are the following: First, we define a scalable architecture to collect the metrics and store the data. Second, we design and implement a process for building prediction models that characterize application performance using data mining and statistical techniques. Lastly, we evaluate the accuracy of the prediction models.
Open Access
Integrating Classification With K-Means to Detect E-Commerce Transaction Anomaly
(2016-09-20) Tan, Xing; Yang, Zijiang Cynthia
Effective data mining solutions have been anticipated in Electronic Commerce (E-Commerce) transaction anomaly detection model to accurately predict anomaly transaction records. However, there are many sub-optimal E-Commerce transaction anomaly detection models due to highly imbalanced data set. This thesis proposes a meta-cluster with K-means algorithm to solve the problem of highly imbalanced data. This meta-cluster with K-means algorithm will be applied as a preprocessing method. The main aim is to generate a collection of clusters from the E-commerce transaction anomaly data set, each of which contains similar instances. The Logistic Regression, Naive Bayes, RBFNetwork and NBtree classifiers will be applied to evaluate the generated clusters. Results indicate that the proposed method can be easily realized and achieve excellent performance. The most important is that the proposed method can deal with the imbalanced data sets well and minimize type-II errors.
Open Access
An Initial Analysis on the Impact of Software Transparency and Privacy on a Healthcare Environment
(2016-09-20) Zinovatna, Olena; Cysneiros, Luiz Marcio
Transparency and privacy are two fundamental parts of any democratic society. Although both transparency and privacy are essential in todays environment they are often conflicting. Allowing more transparency is likely to impact privacy, likewise, preserving privacy often reduces transparency. With consistently evolving nature of information technology and a tremendous amount of data being generated on a daily basis, there is a growing need to balance privacy and transparency in order to exist in the fast paced environment. The purpose of this work is to understand the current state of software transparency and privacy as well as how it is being perceived in the workplace. This thesis focuses on the following three objectives. First, it supports the development of the catalogues documenting all existing privacy concerns and how they relate to transparency. Second, it narrows down its focus to a healthcare domain. Lastly, it evaluates current state of software transparency in existing health information systems.
Open Access
Improvement in Probabilistic Information Retrieval Model: Rewarding Terms with High Relative Term Frequency
(2016-11-25) Zhu, Runjie; Huang, Xiangji
In this thesis, I propose the relative term frequency to be integrated into traditional probabilistic models, in other words, I introduce a set of three influence functions with the application of relative term frequency to model and enhance the performance of the fundamental probabilistic weighting function, BM25. The study aims to exploit the properties of the combination of relative term frequency and BM25. The extensive experiments and analyses conducted in the thesis are based on six of the TREC official datasets, and the results presented have shown a significant improvement in the retrieval effectiveness. The information retrieval system adopted is built on the Okapi Basic Search System (BSS), which offers a reliable and effective packaged framework to exercise the experiments, and to yield an end-to-end retrieval workflow.
Open Access
Towards an Ontology-Based Approach for Reusing Non-Functional Requirements Knowledge
(2017-07-27) Veleda, Rodrigo Da Rocha Vaughan; Cysneiros, Luiz Marcio
Requirements Engineering play a crucial role during the software development process. Many works have pointed out that Non-Functional Requirements (NFR) are currently more important than Functional Requirements. NFRs can be very complicated to understand due to its diversity and subjective nature. The NDR Framework has been proposed to fill some of the existing gaps to facilitate NFR elicitation and modeling. In this thesis, we introduce a tool that plays a major role in the NDR Framework allowing software engineers to store and reuse NFR knowledge. The NDR Tool converts the knowledge contained in Softgoal Interdependency Graphs (SIGs) into a machine-readable format that follows the NFR and Design Rationale (NDR) Ontology. It also provides mechanisms to query the knowledge base and produces graphical representation for the results obtained. To evaluate whether our approach aids eliciting NFRs, we conducted an experiment performing a software development scenario.
Open Access
Rewarding the Location of Terms in Sentences to Enhance Probabilistic Information Retrieval
(2017-07-27) Liu, Baiyan; Huang, Xiangji
In most traditional retrieval models, the weight (or probability) of a query term is estimated based on its own distribution or statistics. Intuitively, however, the nouns are more important in information retrieval and are more often found near the beginning and the end of sentences. In this thesis, we investigate the effect of rewarding the terms based on their location in sentences on information retrieval. Particularly, we propose a kernel-based method to capture the term placement pattern, in which a novel Term Location retrieval model is derived in combination with the BM25 model to enhance probabilistic information retrieval. Experiments on five TREC datasets of varied size and content indicates that the proposed model significantly outperforms the optimized BM25 and DirichletLM in MAP over all datasets with all kernel functions, and excels compared to the optimized BM25 and DirichletLM over most of the datasets in P@5 and P@20 with different kernel functions.
Open Access
Using K-means Clustering and Similarity Measure to Deal with Missing Rating in Collaborative Filtering Recommendation Systems
(2018-03-01) Xiong, Chenrui; Yang, Zijiang Cynthia
The Collaborative Filtering recommendation systems have been developed to address the information overload problem and personalize the content to the users for business and organizations. However, the Collaborative Filtering approach has its limitation of data sparsity and online scalability problems which result in low recommendation quality. In this thesis, a novel Collaborative Filtering approach is introduced using clustering and similarity technologies. The proposed method using K-means clustering to partition the entire dataset reduces the time complexity and improves the online scalability as well as the data density. Moreover, the similarity comparison method predicts and fills up the missing value in sparsity dataset to enhance the data density which boosts the recommendation quality. This thesis uses MovieLens dataset to investigate the proposed method, which yields amazing experimental outcome on a large sparsity data set that has a higher quality with lower time complexity than the traditional Collaborative Filtering approaches.
Open Access
Automatic Image Recognition of Rapid Malaria Emergency Diagnosis: A Deep Neural Network Approach
(2018-03-01) Liang, Zhaohui; Huang, Xiangji
Deep learning is the state-of-the-art artificial intelligence (AI) method for visual pattern detection and automated diagnosis. This paper describes the application of convolutional neural network (CNN), the deep learning model for visual recognition, to automatic detection of plasmodium parasitized red blood cells for malaria field screening and rapid diagnosis. The malaria thin blood smears are from Bangladesh and initially labeled by a specialist. 27,578 red blood cell images are segmented (raw set). The images are rotated clockwise three times to generate an augmented dataset with 110,312 red blood cell images. A 12-layer and an 18-layer CNN-based Malaria Net models are applied to classify both the raw data set and the augmented dataset. The performance is evaluated by ten-fold cross-validation and compared to a transfer learning model. In the ten-fold cross-validation test for Malaria Net, the average accuracy is 97.37% (18-layer) and 96.09% (12-layer) with the raw set, and is 97.93% and 96.75% with the augmented set, in comparison to 91.99% with the raw set and 94.26% with the augmented set in transfer learning. In addition, the two CNN models show superiority over transfer learning in all performance indicators such as sensitivity, specificity, precision, F1 score, and Matthews correlation coefficient. The Malaria Net can accurately detect malaria-infected red blood cells. A CNN model trained by domain-specific data shows superior performance over the transfer-learning method. Automatic image classification powered by deep learning offers not only an accurate method for the malaria field screening and rapid diagnosis but also a new solution for malaria control especially in resource-poor regions.