Term Association Modelling in Information Retrieval

Zhao, Jiashu

Term Association Modelling in Information Retrieval

dc.contributor.advisor	Huang, Xiangji
dc.creator	Zhao, Jiashu
dc.date.accessioned	2015-08-28T15:38:01Z
dc.date.available	2015-08-28T15:38:01Z
dc.date.copyright	2015-03-23
dc.date.issued	2015-08-28
dc.date.updated	2015-08-28T15:38:01Z
dc.degree.discipline	Computer Science
dc.degree.level	Doctoral
dc.degree.name	PhD - Doctor of Philosophy
dc.description.abstract	Many traditional Information Retrieval (IR) models assume that query terms are independent of each other. For those models, a document is normally represented as a bag of words/terms and their frequencies. Although traditional retrieval models can achieve reasonably good performance in many applications, the corresponding independence assumption has limitations. There are some recent studies that investigate how to model term associations/dependencies by proximity measures. However, the modeling of term associations theoretically under the probabilistic retrieval framework is still largely unexplored. In this thesis, I propose a new concept named Cross Term, to model term proximity, with the aim of boosting retrieval performance. With Cross Terms, the association of multiple query terms can be modeled in the same way as a simple unigram term. In particular, an occurrence of a query term is assumed to have an impact on its neighboring text. The degree of the query term impact gradually weakens with increasing distance from the place of occurrence. Shape functions are used to characterize such impacts. Based on this assumption, I first propose a bigram CRoss TErm Retrieval (CRTER2) model for probabilistic IR and a Language model based model CRTER2LM. Specifically, a bigram Cross Term occurs when the corresponding query terms appear close to each other, and its impact can be modeled by the intersection of the respective shape functions of the query terms. Second, I propose a generalized n-gram CRoss TErm Retrieval (CRTERn) model recursively for n query terms where n>2. For n-gram Cross Term, I develop several distance metrics with different properties and employ them in the proposed models for ranking. Third, an enhanced context-sensitive proximity model is proposed to boost the CRTER models, where the contextual relevance of term proximity is studied. The models are validated on several large standard data sets, and show improved performance over other state-of-art approaches. I also discusse the practical impact of the proposed models. The approaches in this thesis can also provide helpful benefit for term association modeling in other domains.
dc.identifier.uri	http://hdl.handle.net/10315/30082
dc.language.iso	en
dc.rights	Author owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subject	Computer science
dc.subject.keywords	Information Retrieval
dc.subject.keywords	Term Association
dc.subject.keywords	Context-sensitive
dc.subject.keywords	Bi-gram
dc.subject.keywords	N-gram
dc.subject.keywords	Search
dc.subject.keywords	Proximity
dc.subject.keywords	Kernel Function
dc.subject.keywords	Indexing
dc.subject.keywords	Experimentation
dc.subject.keywords	modeling
dc.title	Term Association Modelling in Information Retrieval
dc.type	Electronic Thesis or Dissertation	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Zhao_Jiashu_2015_PhD.pdf
Size:: 2.41 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 2 of 2

Name:: license.txt
Size:: 1.83 KB
Format:: Plain Text
Description:

Download

Name:: YorkU_ETDlicense.txt
Size:: 3.38 KB
Format:: Plain Text
Description:

Download

Collections

Computer Science and Engineering