Towards Agentic Vision Language Models for Question Answering on Interactive Dashboard

Prince, Enamul HoqueKartha, Aaryaman Sudhir2026-03-102026-03-102025-12-152026-03-10https://hdl.handle.net/10315/43616Multimodal models, specifically Vision Language Models (VLMs), have shown increasing capabilities in data visualization oriented downstream tasks, achieving performance saturation in shorter intervals of time. Consequently, focus has shifted to assessing their potential towards new frontiers, specifically interactive environments. Various benchmarks center around data visualization question answering tasks on static visualizations, and such rudimentary approaches don’t reflect real world analysis scenarios where vast decision making is required. Dashboards, while being commonplace tools in various industries, have had limited work done into evaluating the capabilities of VLMs to traverse and reason with them. To tackle these limitations, this thesis presents DashboardQA, a novel benchmark for interactive dashboard question answering. Overall, 292 tasks encompassing 405 QA pairs are presented from 5 diverse category types, with 112 carefully chosen dashboards represented. Experimental results show this benchmark is a challenge for various types of VLMs assessed, with the best model achieving 38.69 %.Author owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.Computer scienceTowards Agentic Vision Language Models for Question Answering on Interactive DashboardElectronic Thesis or Dissertation2026-03-10Natural language processingQuestion answeringHuman Computer InteractionData visualizationInteractive visualizationsDashboardsVision language models