Chart Question Answering with an Universal Vision-Language Pretraining Approach

dc.contributor.advisorEnamul Hoque Prince
dc.contributor.authorParsa Kavehzadeh
dc.date.accessioned2023-12-08T14:33:09Z
dc.date.available2023-12-08T14:33:09Z
dc.date.issued2023-12-08
dc.date.updated2023-12-08T14:33:09Z
dc.degree.disciplineComputer Science
dc.degree.levelMaster's
dc.degree.nameMSc - Master of Science
dc.description.abstractCharts are widely used for data analysis, providing visual representations and insights into complex data. To facilitate chart-based data analysis using natural language, several downstream tasks have been introduced recently including chart question answering. However, existing methods for these tasks often rely on pretraining on language or vision-language tasks, neglecting the explicit modeling of chart structures. To address this, we first build a large corpus of charts covering diverse topics and visual styles. We then present UniChart, a pretrained model for chart comprehension and reasoning. We propose several chart-specific pretraining tasks that include: (i) low-level tasks to extract the visual elements (e.g., bars, lines) and data from charts, and (ii) high-level tasks to acquire chart understanding and reasoning skills. Our experiments demonstrate that pretraining UniChart on a large corpus with chart-specific objectives, followed by fine-tuning, yields state-of-the-art performance on four downstream tasks. Moreover, our model exhibits superior generalizability to unseen chart corpus, surpassing previous approaches that lack chart-specific objectives and utilize limited chart resources.
dc.identifier.urihttps://hdl.handle.net/10315/41672
dc.languageen
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subjectArtificial intelligence
dc.subject.keywordsnatural language processing
dc.subject.keywordsinformation visualization
dc.subject.keywordscharts
dc.subject.keywordschart question answering
dc.subject.keywordsquestion answering
dc.subject.keywordspretraining
dc.subject.keywordschart comprehension
dc.subject.keywordstransformers
dc.titleChart Question Answering with an Universal Vision-Language Pretraining Approach
dc.typeElectronic Thesis or Dissertation

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kavehzadeh_Parsa_2023_Masters.pdf
Size:
2.74 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
license.txt
Size:
1.87 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
YorkU_ETDlicense.txt
Size:
3.39 KB
Format:
Plain Text
Description:

Collections