Visualizations play a pivotal role in daily communication in an increasingly data-driven world. Research on multimodal large language models (MLLMs) for automated chart understanding has accelerated massively, with steady improvements on standard benchmarks. However, for MLLMs to be reliable, they must be robust to misleading visualizations, charts that distort the underlying data, leading readers to draw inaccurate conclusions that may support disinformation. Here, we uncover an important vulnerability: MLLM question-answering accuracy on misleading visualizations drops on average to the level of a random baseline. To address this, we introduce the first inference-time methods to improve performance on misleading visualizations, without compromising accuracy on non-misleading ones. The most effective method extracts the underlying data table and uses a text-only LLM to answer the question based on the table. Our findings expose a critical blind spot in current research and establish benchmark results to guide future efforts in reliable MLLMs.
View on arXiv@article{tonglet2025_2502.20503, title={ Protecting multimodal large language models against misleading visualizations }, author={ Jonathan Tonglet and Tinne Tuytelaars and Marie-Francine Moens and Iryna Gurevych }, journal={arXiv preprint arXiv:2502.20503}, year={ 2025 } }