Statistical Modeling to Information Retrieval for Searching from Big Text Data and Higher Order Inference for Reliability
MetadataShow full item record
This thesis examined two research projects: probabilistic information retrieval modeling and third-order inference on reliability. In the first part of this dissertation, two research topics in the information retrieval are carried out and experimented on large-scale text data set. First, we conduct an in-depth study of relationship between information of document length and document relevance to user need. Two statistical methods are proposed which incorporates document length as a substantial weighting factor to achieve higher retrieval performance. Second, we utilize the property of survival function to propose a cost-based re-ranking method to promote ranking diversity for biomedical information retrieval, and to model the proximity between query terms to improve retrieval performance. Through extensive experiments on standard TREC collections, our proposed models perform significantly better than the classical probabilistic information retrieval models. In the second part of this dissertation, a small sample asymptotic method is proposed for higher order inference in the stress-strength reliability model, R=P(Y<X), where X and Y are independently distributed. A penalized likelihood method is proposed to handle the numerical complications of maximizing the constrained likelihood model. Simulation studies are conducted on two distributions: Burr type X distribution and exponentiated exponential distribution. Results from simulation studies show that the proposed method is very accurate even when the sample sizes are small.