YorkSpace has migrated to a new version of its software. Access our Help Resources to learn how to use the refreshed site. Contact diginit@yorku.ca if you have any questions about the migration.
 

Statistical Modeling to Information Retrieval for Searching from Big Text Data and Higher Order Inference for Reliability

dc.contributor.advisorWong, Augustine Chi Mou
dc.creatorZhou, Xiaofeng
dc.date.accessioned2015-01-26T14:50:18Z
dc.date.available2015-01-26T14:50:18Z
dc.date.copyright2014-06-26
dc.date.issued2015-01-26
dc.date.updated2015-01-26T14:50:18Z
dc.degree.disciplineMathematics & Statistics
dc.degree.levelDoctoral
dc.degree.namePhD - Doctor of Philosophy
dc.description.abstractThis thesis examined two research projects: probabilistic information retrieval modeling and third-order inference on reliability. In the first part of this dissertation, two research topics in the information retrieval are carried out and experimented on large-scale text data set. First, we conduct an in-depth study of relationship between information of document length and document relevance to user need. Two statistical methods are proposed which incorporates document length as a substantial weighting factor to achieve higher retrieval performance. Second, we utilize the property of survival function to propose a cost-based re-ranking method to promote ranking diversity for biomedical information retrieval, and to model the proximity between query terms to improve retrieval performance. Through extensive experiments on standard TREC collections, our proposed models perform significantly better than the classical probabilistic information retrieval models. In the second part of this dissertation, a small sample asymptotic method is proposed for higher order inference in the stress-strength reliability model, R=P(Y<X), where X and Y are independently distributed. A penalized likelihood method is proposed to handle the numerical complications of maximizing the constrained likelihood model. Simulation studies are conducted on two distributions: Burr type X distribution and exponentiated exponential distribution. Results from simulation studies show that the proposed method is very accurate even when the sample sizes are small.
dc.identifier.urihttp://hdl.handle.net/10315/28237
dc.language.isoen
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subjectInformation technology
dc.subjectStatistics
dc.subjectMathematics
dc.subject.keywordsReliabilityen_US
dc.subject.keywordsInformation retrievalen_US
dc.subject.keywordsProbabilistic IRen_US
dc.subject.keywordsBM25en_US
dc.subject.keywordsDocument lengthen_US
dc.subject.keywordsSurvival modelingen_US
dc.subject.keywordsDiversityen_US
dc.subject.keywordsRe-ranken_US
dc.subject.keywordsBiomedical IRen_US
dc.subject.keywordsThesaurusen_US
dc.subject.keywordsAspecten_US
dc.subject.keywordsTerm proximityen_US
dc.subject.keywordsThird-order inferenceen_US
dc.titleStatistical Modeling to Information Retrieval for Searching from Big Text Data and Higher Order Inference for Reliability
dc.typeElectronic Thesis or Dissertation

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Zhou_Xiaofeng_2014_PhD.pdf
Size:
940.82 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
license.txt
Size:
1.83 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
YorkU_ETDlicense.txt
Size:
3.38 KB
Format:
Plain Text
Description: