Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1801.08163
Cited By
DVQA: Understanding Data Visualizations via Question Answering
24 January 2018
Kushal Kafle
Brian L. Price
Scott D. Cohen
Christopher Kanan
AIMat
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DVQA: Understanding Data Visualizations via Question Answering"
50 / 242 papers shown
Title
Judging the Judges: Can Large Vision-Language Models Fairly Evaluate Chart Comprehension and Reasoning?
Md Tahmid Rahman Laskar
Mohammed Saidul Islam
Ridwan Mahbub
Ahmed Masry
Mizanur Rahman
Amran Bhuiyan
Mir Tafseer Nayeem
Shafiq R. Joty
Enamul Hoque
Jimmy X. Huang
ELM
14
0
0
13 May 2025
Critique Before Thinking: Mitigating Hallucination through Rationale-Augmented Instruction Tuning
Zexian Yang
Dian Li
Dayan Wu
Gang Liu
Weiping Wang
MLLM
LRM
36
0
0
12 May 2025
Visual Instruction Tuning with Chain of Region-of-Interest
Yixin Chen
Shuai Zhang
Boran Han
Bernie Wang
21
0
0
11 May 2025
Adaptive Markup Language Generation for Contextually-Grounded Visual Document Understanding
Han Xiao
Yina Xie
Guanxin Tan
Yinghao Chen
R. Hu
...
Peng Gao
Yafei Wen
Xiaoxin Chen
Shuai Ren
Hongsheng Li
VLM
40
0
0
08 May 2025
Rethinking Visual Layer Selection in Multimodal LLMs
H. Chen
Junyan Lin
Xinhao Chen
Yue Fan
Xin Jin
Hui Su
Jianfeng Dong
Jinlan Fu
Xiaoyu Shen
VLM
95
0
0
30 Apr 2025
ChartQA-X: Generating Explanations for Charts
Shamanthak Hegde
Pooyan Fazli
H. Seifi
14
0
0
17 Apr 2025
FUSION: Fully Integration of Vision-Language Representations for Deep Cross-Modal Understanding
Zheng Liu
Mengjie Liu
J. Chen
Jingwei Xu
Bin Cui
Conghui He
Wentao Zhang
MLLM
57
0
0
14 Apr 2025
Breaking the Data Barrier -- Building GUI Agents Through Task Generalization
Junlei Zhang
Zichen Ding
Chang Ma
Zijie Chen
Qiushi Sun
Zhenzhong Lan
Junxian He
71
0
0
14 Apr 2025
InternVL3: Exploring Advanced Training and Test-Time Recipes for Open-Source Multimodal Models
Jinguo Zhu
Weiyun Wang
Zhe Chen
Z. Liu
Shenglong Ye
...
D. Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
W. Wang
MLLM
VLM
66
7
1
14 Apr 2025
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Xingguang Ji
Jiakang Wang
Hongzhi Zhang
Jingyuan Zhang
Haonan Zhou
Chenxi Sun
Y. Liu
Qi Wang
Fuzheng Zhang
MLLM
VLM
58
0
0
10 Apr 2025
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Jingyuan Zhang
Hongzhi Zhang
Zhou Haonan
Chenxi Sun
Xingguang Ji
Jiakang Wang
Fanheng Kong
Y. Liu
Qi Wang
Fuzheng Zhang
VLM
53
1
0
10 Apr 2025
RefChartQA: Grounding Visual Answer on Chart Images through Instruction Tuning
Alexander Vogel
Omar Moured
Yufan Chen
Jiaming Zhang
Rainer Stiefelhagen
35
0
0
29 Mar 2025
Learning to Instruct for Visual Instruction Tuning
Zhihan Zhou
Feng Hong
Jiaan Luo
Jiangchao Yao
Dongsheng Li
Bo Han
Y. Zhang
Yanfeng Wang
VLM
59
0
0
28 Mar 2025
Breaking Language Barriers in Visual Language Models via Multilingual Textual Regularization
Iñigo Pikabea
Iñaki Lacunza
Oriol Pareras
Carlos Escolano
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
VLM
49
0
0
28 Mar 2025
MLLM-Selector: Necessity and Diversity-driven High-Value Data Selection for Enhanced Visual Instruction Tuning
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Jiayi Ji
Jie Lou
Debing Zhang
Rongrong Ji
90
0
0
26 Mar 2025
Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Zitian Wang
Yue Liao
Kang Rong
Fengyun Rao
Yibo Yang
Si Liu
70
0
0
26 Mar 2025
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Weili Zeng
Ziyuan Huang
Kaixiang Ji
Yichao Yan
VLM
42
1
0
26 Mar 2025
Scaling Vision Pre-Training to 4K Resolution
Baifeng Shi
Boyi Li
Han Cai
Y. Lu
Sifei Liu
...
Jan Kautz
Song Han
Trevor Darrell
Pavlo Molchanov
Hongxu Yin
CLIP
83
0
0
25 Mar 2025
DomainCQA: Crafting Expert-Level QA from Domain-Specific Charts
Ling Zhong
Yujing Lu
Jing Yang
Weiming Li
Peng Wei
Yongheng Wang
Manni Duan
Qing Zhang
45
0
0
25 Mar 2025
On the Perception Bottleneck of VLMs for Chart Understanding
Junteng Liu
Weihao Zeng
Xiwen Zhang
Yijun Wang
Zifei Shan
Junxian He
60
0
0
24 Mar 2025
SlowFast-LLaVA-1.5: A Family of Token-Efficient Video Large Language Models for Long-Form Video Understanding
Mingze Xu
Mingfei Gao
Shiyu Li
Jiasen Lu
Zhe Gan
Zhengfeng Lai
Meng Cao
Kai Kang
Y. Yang
Afshin Dehghan
51
1
0
24 Mar 2025
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
Zining Wang
Tongkun Guan
Pei Fu
Chen Duan
Qianyi Jiang
Zhentao Guo
Shan Guo
Junfeng Luo
Wei-Ming Shen
Xiaokang Yang
MLLM
VLM
69
0
0
18 Mar 2025
Federated Continual Instruction Tuning
Haiyang Guo
Fanhu Zeng
Fei Zhu
Wenzhuo Liu
Da-Han Wang
Jian Xu
Xu-Yao Zhang
Cheng-Lin Liu
CLL
FedML
63
1
0
17 Mar 2025
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Weiyun Wang
Zhangwei Gao
L. Chen
Zhe Chen
Jinguo Zhu
...
Lewei Lu
Haodong Duan
Yu Qiao
Jifeng Dai
Wenhai Wang
LRM
60
10
0
13 Mar 2025
LMM-R1: Empowering 3B LMMs with Strong Reasoning Abilities Through Two-Stage Rule-Based RL
Yingzhe Peng
Gongrui Zhang
Miaosen Zhang
Zhiyuan You
Jie Liu
Qipeng Zhu
Kai Yang
Xingzhong Xu
Xin Geng
Xu Yang
LRM
ReLM
86
31
0
10 Mar 2025
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Wenxuan Huang
Bohan Jia
Zijie Zhai
Shaosheng Cao
Zheyu Ye
Fei Zhao
Zhe Xu
Yao Hu
Shaohui Lin
MU
OffRL
LRM
MLLM
ReLM
VLM
55
37
0
09 Mar 2025
Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices
Junyan Lin
Haoran Chen
Yue Fan
Yingqi Fan
Xin Jin
Hui Su
Jinlan Fu
Xiaoyu Shen
60
0
0
08 Mar 2025
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks
Feng Ni
Kui Huang
Yao Lu
Wenyu Lv
Guanzhong Wang
Zeyu Chen
Y. Liu
VLM
42
0
0
06 Mar 2025
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan
Zining Wang
Pei Fu
Zhengtao Guo
Wei-Ming Shen
...
Chen Duan
Hao Sun
Qianyi Jiang
Junfeng Luo
Xiaokang Yang
VLM
43
0
0
04 Mar 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
82
3
0
26 Feb 2025
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
Pei Fu
Tongkun Guan
Zining Wang
Zhentao Guo
Chen Duan
...
Boming Chen
Jiayao Ma
Qianyi Jiang
Kai Zhou
Junfeng Luo
VLM
53
0
0
23 Feb 2025
Scaling Text-Rich Image Understanding via Code-Guided Synthetic Multimodal Data Generation
Y. Yang
Ajay Patel
Matt Deitke
Tanmay Gupta
Luca Weihs
...
Mark Yatskar
Chris Callison-Burch
Ranjay Krishna
Aniruddha Kembhavi
Christopher Clark
SyDa
68
1
0
21 Feb 2025
Megrez-Omni Technical Report
Boxun Li
Yadong Li
Z. Li
Congyi Liu
Weilin Liu
...
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Y. Wang
44
0
0
19 Feb 2025
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
107
9
0
18 Feb 2025
Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence
Granite Vision Team
Leonid Karlinsky
Assaf Arbelle
Abraham Daniels
A. Nassar
...
Sriram Raghavan
T. Syeda-Mahmood
Peter W. J. Staar
Tal Drory
Rogerio Feris
VLM
AI4TS
102
0
0
14 Feb 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
J. Liu
Tao Zhang
Tao Zhang
S. Chen
...
Jianhua Xu
Haoze Sun
Mingan Lin
Zenan Zhou
Weipeng Chen
AuLLM
70
10
0
28 Jan 2025
Using Large Language Models for education managements in Vietnamese with low resources
Duc Do Minh
Vinh Nguyen Van
Thang Dam Cong
36
0
0
28 Jan 2025
InternLM-XComposer2.5-Reward: A Simple Yet Effective Multi-Modal Reward Model
Yuhang Zang
Xiaoyi Dong
Pan Zhang
Yuhang Cao
Ziyu Liu
...
Haodong Duan
W. Zhang
Kai Chen
D. Lin
Jiaqi Wang
VLM
70
18
0
21 Jan 2025
Augmenting a Large Language Model with a Combination of Text and Visual Data for Conversational Visualization of Global Geospatial Data
Omar Mena
Alexandre Kouyoumdjian
Lonni Besancon
Michael Gleicher
I. Viola
Anders Ynnerman
33
0
0
17 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
91
46
0
03 Jan 2025
SBS Figures: Pre-training Figure QA from Stage-by-Stage Synthesized Images
Risa Shinoda
Kuniaki Saito
Shohei Tanaka
Tosho Hirasawa
Yoshitaka Ushiku
31
1
0
23 Dec 2024
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLM
VLM
102
1
0
20 Dec 2024
LLaVA-UHD v2: an MLLM Integrating High-Resolution Semantic Pyramid via Hierarchical Window Transformer
Yipeng Zhang
Y. Liu
Zonghao Guo
Yidan Zhang
Xuesong Yang
...
Yuan Yao
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
Maosong Sun
MLLM
VLM
84
0
0
18 Dec 2024
Dynamic-VLM: Simple Dynamic Visual Token Compression for VideoLLM
H. Wang
Yuxiang Nie
Yongjie Ye
Deng GuanYu
Yanjie Wang
Shuai Li
Haiyang Yu
Jinghui Lu
Can Huang
VLM
MLLM
79
1
0
12 Dec 2024
Chimera: Improving Generalist Model with Domain-Specific Experts
Tianshuo Peng
M. Li
Hongbin Zhou
Renqiu Xia
Renrui Zhang
...
Aojun Zhou
Botian Shi
Tao Chen
Bo Zhang
Xiangyu Yue
86
4
0
08 Dec 2024
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee
Ryo Hachiuma
Yu-Chiang Frank Wang
Y. Ro
Yueh-Hua Wu
VLM
81
0
0
02 Dec 2024
DOGE: Towards Versatile Visual Document Grounding and Referring
Yinan Zhou
Yuxin Chen
Haokun Lin
Shuyu Yang
Li Zhu
Zhongang Qi
Chen Ma
Ying Shan
ObjD
78
2
0
26 Nov 2024
FocusLLaVA: A Coarse-to-Fine Approach for Efficient and Effective Visual Token Compression
Yuke Zhu
Chi Xie
Shuang Liang
Bo Zheng
Sheng Guo
64
8
0
21 Nov 2024
Enhancing the Reasoning Ability of Multimodal Large Language Models via Mixed Preference Optimization
Weiyun Wang
Zhe Chen
Wenhai Wang
Yue Cao
Yangzhou Liu
...
Jinguo Zhu
X. Zhu
Lewei Lu
Yu Qiao
Jifeng Dai
LRM
57
46
1
15 Nov 2024
Multimodal Instruction Tuning with Hybrid State Space Models
Jianing Zhou
Han Li
Shuai Zhang
Ning Xie
Ruijie Wang
Xiaohan Nie
Sheng Liu
Lingyun Wang
33
0
0
13 Nov 2024
1
2
3
4
5
Next