Towards Agentic Vision Language Models for Question Answering on Interactive Dashboard

Kartha, Aaryaman Sudhir

Towards Agentic Vision Language Models for Question Answering on Interactive Dashboard

Files

Kartha_Aaryaman_Sudhir_2025_MSc.pdf (28.64 MB)

Date

2026-03-10

Authors

Kartha, Aaryaman Sudhir

Abstract

Multimodal models, specifically Vision Language Models (VLMs), have shown increasing capabilities in data visualization oriented downstream tasks, achieving performance saturation in shorter intervals of time. Consequently, focus has shifted to assessing their potential towards new frontiers, specifically interactive environments. Various benchmarks center around data visualization question answering tasks on static visualizations, and such rudimentary approaches don’t reflect real world analysis scenarios where vast decision making is required. Dashboards, while being commonplace tools in various industries, have had limited work done into evaluating the capabilities of VLMs to traverse and reason with them. To tackle these limitations, this thesis presents DashboardQA, a novel benchmark for interactive dashboard question answering. Overall, 292 tasks encompassing 405 QA pairs are presented from 5 diverse category types, with 112 carefully chosen dashboards represented. Experimental results show this benchmark is a challenge for various types of VLMs assessed, with the best model achieving 38.69 %.

Keywords

Computer science

URI

https://hdl.handle.net/10315/43616

Collections

Computer Science

Full item page

Towards Agentic Vision Language Models for Question Answering on Interactive Dashboard

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections