YorkSpace has migrated to a new version of its software. Access our Help Resources to learn how to use the refreshed site. Contact diginit@yorku.ca if you have any questions about the migration.
 

Making a Better Query: Find Good Feedback Documents and Terms via Semantic Associations

dc.contributor.advisorHuang, Xiangji
dc.creatorMiao, Jun
dc.date.accessioned2017-07-27T13:32:49Z
dc.date.available2017-07-27T13:32:49Z
dc.date.copyright2016-06-14
dc.date.issued2017-07-27
dc.date.updated2017-07-27T13:32:49Z
dc.degree.disciplineComputer Science
dc.degree.levelDoctoral
dc.degree.namePhD - Doctor of Philosophy
dc.description.abstractWhen people search, they always input several keywords as an input query. While current information retrieval (IR) systems are based on term matching, documents will not be considered as relevant if they do not have the exact terms as in the query. However, it is common that these documents are relevant if they contain terms semantically similar to the query. To retrieve these documents, a classic way is to expand the original query with more related terms. Pseudo relevance feedback (PRF) has proven to be effective to expand origin queries and improve the performance of IR. It assumes the top k ranked documents obtained through the first round retrieval are relevant as feedback documents, and expand the original queries with feedback terms selected from these feedback documents. However, applying PRF for query expansion must be very carefully. Wrongly added terms can bring noisy information and hurt the overall search experiences extensively. The assumption of feedback documents is too strong to be completely true. To avoid noise import and make significant improvements simultaneously, we solve the significant problem through four ways in this dissertation. Firstly, we assume the proximity information among terms as term semantic associations and utilize them to seek new relevant terms. Next, to obtain good and robust performance for PRF via adapting topic information, we propose a new concept named topic space and present three models based on it. Topics obtained through topic modeling do help identify how relevant a feedback document is. Weights of candidate terms in these more relevant feedback documents will be boosted and have higher probabilities to be chosen. Furthermore, we apply machine learning methods to classify which feedback documents are effective for PRF. To solve the problem of lack-of-training-data for the application of machine learning methods in PRF, we improve a traditional co-training method and take the quality of classifiers into account. Finally, we present a new probabilistic framework to integrate existing effective methods like semantic associations as components for further research. All the work has been tested on public datasets and proven to be effective and efficient.
dc.identifier.urihttp://hdl.handle.net/10315/33504
dc.language.isoen
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subjectComputer science
dc.subject.keywordsComputer Science
dc.subject.keywordsInformation Retrieval
dc.subject.keywordsTopic Modeling
dc.subject.keywordsMachine Learning
dc.titleMaking a Better Query: Find Good Feedback Documents and Terms via Semantic Associations
dc.typeElectronic Thesis or Dissertation

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Jun_Miao_2017_PhD.pdf
Size:
3.52 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
license.txt
Size:
1.83 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
YorkU_ETDlicense.txt
Size:
3.38 KB
Format:
Plain Text
Description: