Chart Question Answering with an Universal Vision-Language Pretraining Approach
dc.contributor.advisor | Enamul Hoque Prince | |
dc.contributor.author | Parsa Kavehzadeh | |
dc.date.accessioned | 2023-12-08T14:33:09Z | |
dc.date.available | 2023-12-08T14:33:09Z | |
dc.date.issued | 2023-12-08 | |
dc.date.updated | 2023-12-08T14:33:09Z | |
dc.degree.discipline | Computer Science | |
dc.degree.level | Master's | |
dc.degree.name | MSc - Master of Science | |
dc.description.abstract | Charts are widely used for data analysis, providing visual representations and insights into complex data. To facilitate chart-based data analysis using natural language, several downstream tasks have been introduced recently including chart question answering. However, existing methods for these tasks often rely on pretraining on language or vision-language tasks, neglecting the explicit modeling of chart structures. To address this, we first build a large corpus of charts covering diverse topics and visual styles. We then present UniChart, a pretrained model for chart comprehension and reasoning. We propose several chart-specific pretraining tasks that include: (i) low-level tasks to extract the visual elements (e.g., bars, lines) and data from charts, and (ii) high-level tasks to acquire chart understanding and reasoning skills. Our experiments demonstrate that pretraining UniChart on a large corpus with chart-specific objectives, followed by fine-tuning, yields state-of-the-art performance on four downstream tasks. Moreover, our model exhibits superior generalizability to unseen chart corpus, surpassing previous approaches that lack chart-specific objectives and utilize limited chart resources. | |
dc.identifier.uri | https://hdl.handle.net/10315/41672 | |
dc.language | en | |
dc.rights | Author owns copyright, except where explicitly noted. Please contact the author directly with licensing requests. | |
dc.subject | Artificial intelligence | |
dc.subject.keywords | natural language processing | |
dc.subject.keywords | information visualization | |
dc.subject.keywords | charts | |
dc.subject.keywords | chart question answering | |
dc.subject.keywords | question answering | |
dc.subject.keywords | pretraining | |
dc.subject.keywords | chart comprehension | |
dc.subject.keywords | transformers | |
dc.title | Chart Question Answering with an Universal Vision-Language Pretraining Approach | |
dc.type | Electronic Thesis or Dissertation |
Files
Original bundle
1 - 1 of 1
Loading...
- Name:
- Kavehzadeh_Parsa_2023_Masters.pdf
- Size:
- 2.74 MB
- Format:
- Adobe Portable Document Format