Comparative Analysis of Language Models on Augmented Low-Resource Datasets for Application in Question & Answering Systems

Erechtchoukova, Marina G.Ranjbargol, Seyedehsamaneh2024-11-072024-11-072024-05-242024-11-07https://hdl.handle.net/10315/42401This thesis aims to advance natural language processing (NLP) in question-answering (QA) systems for low-resource domains. The research presents a comparative analysis of several pre-trained language models, highlighting their performance enhancements when fine-tuned with augmented data to address several critical questions, such as the effectiveness of synthetic data and the efficiency of data augmentation techniques for improving QA systems in specialized contexts. The study focuses on developing a hybrid QA framework that can be integrated with a cloud-based information system. This approach refines the functionality and applicability of QA systems, boosting their performance in low-resource settings by using targeted fine-tuning and advanced transformer models. The successful application of this method demonstrates the significant potential for specialized, AI-driven QA systems to adapt and thrive in specific environments.Author owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.Information technologyComputer scienceComparative Analysis of Language Models on Augmented Low-Resource Datasets for Application in Question & Answering SystemsElectronic Thesis or Dissertation2024-11-07Natural language processingNLPQuestion answering systemsQA systemsLow-resource domainsPre-trained language modelsFine-tuningData augmentationTransformer modelsBERTSentence transformersCosine similarityMachine reading comprehensionSynthetic dataActive learningContextual synonym substitutionHugging FaceCloud-based applicationsDomain-specific QAInformation retrievalDeep learningArtificial intelligenceAINeural networksSemantic analysisLanguage modelsMachine learningAI-driven QA systems