DSpace Repository

Statistical Modeling to Information Retrieval for Searching from Big Text Data and Higher Order Inference for Reliability

Statistical Modeling to Information Retrieval for Searching from Big Text Data and Higher Order Inference for Reliability

Show full item record

Title: Statistical Modeling to Information Retrieval for Searching from Big Text Data and Higher Order Inference for Reliability
Author: Zhou, Xiaofeng
Abstract: This thesis examined two research projects: probabilistic information retrieval modeling and third-order inference on reliability.
In the first part of this dissertation, two research topics in the information retrieval are carried out and experimented on large-scale text data set. First, we conduct an in-depth study of relationship between information of document length and document relevance to user need. Two statistical methods are proposed which incorporates document length as a substantial weighting factor to achieve higher retrieval performance. Second, we utilize the property of survival function to propose a cost-based re-ranking method to promote ranking diversity for biomedical information retrieval, and to model the proximity between query terms to improve retrieval performance. Through extensive experiments on standard TREC collections, our proposed models perform significantly better than the classical probabilistic information retrieval models.
In the second part of this dissertation, a small sample asymptotic method is proposed for higher order inference in the stress-strength reliability model, R=P(Y<X), where X and Y are independently distributed. A penalized likelihood method is proposed to handle the numerical complications of maximizing the constrained likelihood model. Simulation studies are conducted on two distributions: Burr type X distribution and exponentiated exponential distribution. Results from simulation studies show that the proposed method is very accurate even when the sample sizes are small.
Subject: Information technology
Statistics
Mathematics
Keywords: Reliability
Information retrieval
Probabilistic IR
BM25
Document length
Survival modeling
Diversity
Re-rank
Biomedical IR
Thesaurus
Aspect
Term proximity
Third-order inference
Type: Electronic Thesis or Dissertation
Rights: Author owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
URI: http://hdl.handle.net/10315/28237
Supervisor: Wong, Augustine Chi Mou
Degree: PhD - Doctor of Philosophy
Program: Mathematics & Statistics
Exam date: 2014-06-26
Publish on: 2015-01-26

Files in this item



This item appears in the following Collection(s)