Enhancing General Language Models for Biomedical Test Retrieval via Diversified Prior Knowledge

Huang, Yizheng

Enhancing General Language Models for Biomedical Test Retrieval via Diversified Prior Knowledge

Files

Huang_Yizheng_2023_Masters.pdf (3.1 MB)

Date

2023-12-08

Authors

Huang, Yizheng

Abstract

The thesis introduces the Diversified Prior Knowledge Enhanced General Language Model (DPK-GLM) to improve the efficacy of general language models in biomedical Information Retrieval (IR). General language models often struggle with biomedical data due to its specialized terminology and the need for precise matching. DPK-GLM tackles these challenges by integrating domain-specific knowledge, thereby enhancing the model's ability to understand and process biomedical information. The framework comprises three core components. The first, Knowledge-based Query Expansion, leverages authoritative biomedical databases to enrich search queries with domain-specific entities. The second, Aspect-based Filter, identifies documents that are highly relevant to the query. The third, Diversity-based Score Reweighting, re-ranks these filtered documents by combining similarity and diversity scores, yielding more accurate results. Experimental tests on public biomedical IR datasets confirm that DPK-GLM significantly improves retrieval performance.

Keywords

Information technology, Artificial intelligence, Bioinformatics

URI

https://hdl.handle.net/10315/41736

Collections

Information Systems and Technology

Full item page

Enhancing General Language Models for Biomedical Test Retrieval via Diversified Prior Knowledge

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections