ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.00398
  4. Cited By
DocVQA: A Dataset for VQA on Document Images
v1v2v3 (latest)

DocVQA: A Dataset for VQA on Document Images

1 July 2020
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "DocVQA: A Dataset for VQA on Document Images"

50 / 759 papers shown
GRAFT: GRaPH and Table Reasoning for Textual Alignment -- A Benchmark for Structured Instruction Following and Visual Reasoning
GRAFT: GRaPH and Table Reasoning for Textual Alignment -- A Benchmark for Structured Instruction Following and Visual Reasoning
Abhigya Verma
Sriram Puttagunta
Seganrasan Subramanian
Sravan Ramachandran
128
1
0
21 Aug 2025
DocHop-QA: Towards Multi-Hop Reasoning over Multimodal Document Collections
DocHop-QA: Towards Multi-Hop Reasoning over Multimodal Document Collections
Jiwon Park
Seohyun Pyeon
Jinwoo Kim
Rina Carines Cabal
Yihao Ding
S. Han
LRM
96
0
0
20 Aug 2025
AdaDocVQA: Adaptive Framework for Long Document Visual Question Answering in Low-Resource Settings
AdaDocVQA: Adaptive Framework for Long Document Visual Question Answering in Low-Resource Settings
Haoxuan Li
Wei song
Aofan Liu
Peiwu Qin
86
0
0
19 Aug 2025
Vision-G1: Towards General Vision Language Reasoning with Multi-Domain Data Curation
Vision-G1: Towards General Vision Language Reasoning with Multi-Domain Data Curation
Yuheng Zha
Kun Zhou
Yujia Wu
Yushu Wang
Jie Feng
Zhi Xu
Shibo Hao
Zhengzhong Liu
Eric P. Xing
Zhiting Hu
LRMVLM
104
3
0
18 Aug 2025
LangVision-LoRA-NAS: Neural Architecture Search for Variable LoRA Rank in Vision Language Models
LangVision-LoRA-NAS: Neural Architecture Search for Variable LoRA Rank in Vision Language ModelsInternational Conference on Information Photonics (ICIP), 2025
Krishna Teja Chitty-Venkata
M. Emani
V. Vishwanath
VLM
68
0
0
17 Aug 2025
Simple o3: Towards Interleaved Vision-Language Reasoning
Simple o3: Towards Interleaved Vision-Language Reasoning
Ye Wang
Qianglong Chen
Zejun Li
Siyuan Wang
Shijie Guo
Zhirui Zhang
Zhongyu Wei
MLLMLRMVLM
152
12
0
16 Aug 2025
Ovis2.5 Technical Report
Ovis2.5 Technical Report
Shiyin Lu
Yan Zhao
Yu Xia
Yuwei Hu
Shanshan Zhao
...
Yuhui Chen
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
VLMLRM
135
29
0
15 Aug 2025
Inclusion Arena: An Open Platform for Evaluating Large Foundation Models with Real-World Apps
Inclusion Arena: An Open Platform for Evaluating Large Foundation Models with Real-World Apps
Kangyu Wang
Hongliang He
Lin Liu
Ruiqi Liang
Zhenzhong Lan
Jianguo Li
ALMELM
142
0
0
15 Aug 2025
A Study of Commonsense Reasoning over Visual Object Properties
A Study of Commonsense Reasoning over Visual Object Properties
Abhishek Kolari
Mohammadhossein Khojasteh
Yifan Jiang
Floris den Hengst
Filip Ilievski
OCL
209
0
0
14 Aug 2025
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
MM-BrowseComp: A Comprehensive Benchmark for Multimodal Browsing Agents
Shilong Li
Xingyuan Bu
Wenjie Wang
Jiaheng Liu
Jun Dong
...
Wenhao Huang
Wangchunshu Zhou
Zhaoxiang Zhang
Ruizhe Ding
Shilei Wen
LLMAGLRM
307
6
0
14 Aug 2025
MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models
MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models
Dianyi Wang
Siyuan Wang
Zejun Li
Yikun Wang
Yitong Li
Duyu Tang
Xiaoyu Shen
Xuanjing Huang
Zhongyu Wei
MoE
182
1
0
13 Aug 2025
AgriGPT: a Large Language Model Ecosystem for Agriculture
AgriGPT: a Large Language Model Ecosystem for Agriculture
Bo Yang
Yu Zhang
Lanfei Feng
Yunkui Chen
J. Zhang
...
Yuxuan Chen
Guijun Yang
Yong He
Runhe Huang
Shijian Li
LLMAGKELM
216
4
0
12 Aug 2025
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
DocThinker: Explainable Multimodal Large Language Models with Rule-based Reinforcement Learning for Document Understanding
Wenwen Yu
Zhibo Yang
Yuliang Liu
Xiang Bai
MLLMOffRLLRM
92
4
0
12 Aug 2025
Segmenting and Understanding: Region-aware Semantic Attention for Fine-grained Image Quality Assessment with Large Language Models
Segmenting and Understanding: Region-aware Semantic Attention for Fine-grained Image Quality Assessment with Large Language Models
Chenyue Song
C. Hui
Haiqi Zhu
Feng Jiang
Yachun Mi
Wei Zhang
Shaohui Liu
112
2
0
11 Aug 2025
MolmoAct: Action Reasoning Models that can Reason in Space
MolmoAct: Action Reasoning Models that can Reason in Space
Jason Lee
Jiafei Duan
Haoquan Fang
Yuquan Deng
Shuo Liu
...
Karen Farley
Eli VanderBilt
Ali Farhadi
Dieter Fox
Ranjay Krishna
LM&RoLRM
437
48
0
11 Aug 2025
VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding
VisR-Bench: An Empirical Study on Visual Retrieval-Augmented Generation for Multilingual Long Document Understanding
Jian Chen
Ming Li
Jihyung Kil
Chenguang Wang
Tong Yu
Ryan Rossi
Tianyi Zhou
Changyou Chen
Ruiyi Zhang
RALM
170
5
0
10 Aug 2025
DocR1: Evidence Page-Guided GRPO for Multi-Page Document Understanding
DocR1: Evidence Page-Guided GRPO for Multi-Page Document Understanding
Junyu Xiong
Yonghui Wang
Weichao Zhao
Chenyu Liu
Bing Yin
Wengang Zhou
Houqiang Li
LRM
193
4
0
10 Aug 2025
AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
AURA: A Fine-Grained Benchmark and Decomposed Metric for Audio-Visual Reasoning
Siminfar Samakoush Galougah
Rishie Raj
Sanjoy Chowdhury
Sayan Nag
Ramani Duraiswami
181
3
0
10 Aug 2025
Finding Needles in Images: Can Multimodal LLMs Locate Fine Details?
Finding Needles in Images: Can Multimodal LLMs Locate Fine Details?Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Parth Thakkar
Ankush Agarwal
Prasad Kasu
Pulkit Bansal
Chaitanya Devaguptapu
88
0
0
07 Aug 2025
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
VFlowOpt: A Token Pruning Framework for LMMs with Visual Information Flow-Guided Optimization
Sihan Yang
Runsen Xu
Chenhang Cui
Tai Wang
Dahua Lin
Jiangmiao Pang
132
2
0
07 Aug 2025
VER-Bench: Evaluating MLLMs on Reasoning with Fine-Grained Visual Evidence
VER-Bench: Evaluating MLLMs on Reasoning with Fine-Grained Visual Evidence
Chenhui Qiang
Zhaoyang Wei
Xumeng Han Zipeng Wang
Zipeng Wang
Siyao Li
Xiangyuan Lan
Jianbin Jiao
Zhenjun Han
LRM
84
2
0
06 Aug 2025
Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?
Are We on the Right Way for Assessing Document Retrieval-Augmented Generation?
Wenxuan Shen
Mingjia Wang
Yaochen Wang
Dongping Chen
Junjie Yang
Yao Wan
Weiwei Lin
RALM
145
1
0
05 Aug 2025
Visual Document Understanding and Reasoning: A Multi-Agent Collaboration Framework with Agent-Wise Adaptive Test-Time Scaling
Visual Document Understanding and Reasoning: A Multi-Agent Collaboration Framework with Agent-Wise Adaptive Test-Time Scaling
Xinlei Yu
Z. Chen
Yudong Zhang
Shilin Lu
Ruolin Shen
J. Zhang
Xiaobin Hu
Yanwei Fu
Shuicheng Yan
186
13
0
05 Aug 2025
VLMQ: Efficient Post-Training Quantization for Large Vision-Language Models via Hessian Augmentation
VLMQ: Efficient Post-Training Quantization for Large Vision-Language Models via Hessian Augmentation
Yufei Xue
Yushi Huang
Jiawei Shao
Jun Zhang
MQVLM
126
2
0
05 Aug 2025
Analyze-Prompt-Reason: A Collaborative Agent-Based Framework for Multi-Image Vision-Language Reasoning
Analyze-Prompt-Reason: A Collaborative Agent-Based Framework for Multi-Image Vision-Language Reasoning
Angelos Vlachos
Giorgos Filandrianos
Maria Lymperaiou
Nikolaos Spanos
Ilias Mitsouras
Vasileios Karampinis
Athanasios Voulodimos
LRM
132
0
0
01 Aug 2025
MAGE: Multimodal Alignment and Generation Enhancement via Bridging Visual and Semantic Spaces
MAGE: Multimodal Alignment and Generation Enhancement via Bridging Visual and Semantic SpacesInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Shaojun E
Yuchen Yang
Jiaheng Wu
Yan Zhang
Tiejun Zhao
Ziyan Chen
182
0
0
29 Jul 2025
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again
X-Omni: Reinforcement Learning Makes Discrete Autoregressive Image Generative Models Great Again
Zigang Geng
Y. Wang
Yeyao Ma
Chen Li
Yongming Rao
...
Han Hu
Xiaosong Zhang
Linus
Di Wang
Jie Jiang
172
30
0
29 Jul 2025
MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning
MMAT-1M: A Large Reasoning Dataset for Multimodal Agent Tuning
Tianhong Gao
Yannian Fu
Weiqun Wu
Haixiao Yue
Shanshan Liu
Gang Zhang
MLLMLRM
273
1
0
29 Jul 2025
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
On The Role of Pretrained Language Models in General-Purpose Text Embeddings: A Survey
Meishan Zhang
Xin Zhang
X. Zhao
Shouzheng Huang
Baotian Hu
Min Zhang
253
3
0
28 Jul 2025
METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models
METEOR: Multi-Encoder Collaborative Token Pruning for Efficient Vision Language Models
Yuchen Liu
Yaoming Wang
Bowen Shi
Xiaopeng Zhang
Wenrui Dai
Chenglin Li
Hongkai Xiong
Qi Tian
152
1
0
28 Jul 2025
Multi-Agent Interactive Question Generation Framework for Long Document Understanding
Multi-Agent Interactive Question Generation Framework for Long Document Understanding
Kesen Wang
Daulet Toibazar
Abdulrahman Alfulayt
Abdulaziz S. Albadawi
Ranya A. Alkahtani
Asma A. Ibrahim
Haneen A. Alhomoud
Sherif Mohamed
Pedro J. Moreno
133
3
0
27 Jul 2025
Region-based Cluster Discrimination for Visual Representation Learning
Region-based Cluster Discrimination for Visual Representation Learning
Yin Xie
Kaicheng Yang
Xiang An
Kun Wu
Yongle Zhao
...
Yumeng Wang
Ziyong Feng
Roy Miles
Ismail Elezi
Jiankang Deng
ObjDVLM
188
2
0
26 Jul 2025
MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks
MMESGBench: Pioneering Multimodal Understanding and Complex Reasoning Benchmark for ESG Tasks
Lei Zhang
Xin Zhou
Chaoyue He
Haiyan Zhao
Y. Wu
Hong Xu
Wei Liu
Chunyan Miao
183
0
0
25 Jul 2025
HW-MLVQA: Elucidating Multilingual Handwritten Document Understanding with a Comprehensive VQA Benchmark
HW-MLVQA: Elucidating Multilingual Handwritten Document Understanding with a Comprehensive VQA Benchmark
Aniket Pal
Ajoy Mondal
Minesh Mathew
C. V. Jawahar
VLM
92
0
0
21 Jul 2025
Docopilot: Improving Multimodal Models for Document-Level Understanding
Docopilot: Improving Multimodal Models for Document-Level UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Yuchen Duan
Zhe Chen
Yusong Hu
Weiyun Wang
Shenglong Ye
...
Qibin Hou
Tong Lu
Jiaming Song
Jifeng Dai
Wenhai Wang
176
11
0
19 Jul 2025
Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark
Document Haystack: A Long Context Multimodal Image/Document Understanding Vision LLM Benchmark
Goeric Huybrechts
S. Ronanki
Sai Muralidhar Jayanthi
Jack FitzGerald
Srinivasan Veeravanallur
VLM
193
0
0
18 Jul 2025
Describe Anything Model for Visual Question Answering on Text-rich Images
Describe Anything Model for Visual Question Answering on Text-rich Images
Yen-Linh Vu
Dinh-Thang Duong
Truong-Binh Duong
Anh-Khoi Nguyen
Thanh-Huy Nguyen
...
Jianhua Xing
Xingjian Li
Tianyang Wang
Ulas Bagci
Min Xu
VLM
277
2
0
16 Jul 2025
MCA-LLaVA: Manhattan Causal Attention for Reducing Hallucination in Large Vision-Language Models
MCA-LLaVA: Manhattan Causal Attention for Reducing Hallucination in Large Vision-Language Models
Qiyan Zhao
Xiaofeng Zhang
Yiheng Li
Yun Xing
Xiaosong Yuan
Feilong Tang
Sinan Fan
Xuhang Chen
Xuyao Zhang
Dahan Wang
239
3
0
12 Jul 2025
Animation Needs Attention: A Holistic Approach to Slides Animation Comprehension with Visual-Language Models
Animation Needs Attention: A Holistic Approach to Slides Animation Comprehension with Visual-Language Models
Yifan Jiang
Yibo Xue
Yukun Kang
Pin Zheng
Jian Peng
Feiran Wu
Changliang Xu
DiffMVGen
249
0
0
05 Jul 2025
Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders
Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders
Yizhou Wang
Song Mao
Yang Chen
Yufan Shen
Yinqiao Yan
...
Botian Shi
Guohang Yan
Zhi Yu
Xuming Hu
Ding Wang
187
3
0
04 Jul 2025
AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding
AuroraLong: Bringing RNNs Back to Efficient Open-Ended Video Understanding
Weili Xu
Enxin Song
Wenhao Chai
Xuexiang Wen
Tian-Chun Ye
Gaoang Wang
332
5
0
03 Jul 2025
Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective
Why Reinforcement Fine-Tuning Enables MLLMs Preserve Prior Knowledge Better: A Data Perspective
Zhihao Zhang
Qiaole Dong
Qi Zhang
Jun Zhao
Enyu Zhou
...
Yanwei Fu
Changzhi Sun
Tao Gui
Xuanjing Huang
Kai Chen
CLL
216
0
0
30 Jun 2025
MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation
MEXA: Towards General Multimodal Reasoning with Dynamic Multi-Expert Aggregation
Shoubin Yu
Yue Zhang
Ziyang Wang
Jaehong Yoon
Mohit Bansal
MoELRM
197
3
0
20 Jun 2025
Robustness Evaluation of OCR-based Visual Document Understanding under Multi-Modal Adversarial Attacks
Robustness Evaluation of OCR-based Visual Document Understanding under Multi-Modal Adversarial Attacks
Dong Nguyen Tien
Dung D. Le
AAML
214
0
0
19 Jun 2025
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Vision-Guided Chunking Is All You Need: Enhancing RAG with Multimodal Document Understanding
Vishesh Tripathi
Tanmay Odapally
Indraneel Das
Uday Allu
Biddwan Ahmed
VLM
229
1
0
19 Jun 2025
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and Charts
WikiMixQA: A Multimodal Benchmark for Question Answering over Tables and ChartsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Negar Foroutan
Angelika Romanou
Matin Ansaripour
Julian Martin Eisenschlos
Karl Aberer
R. Lebret
250
2
0
18 Jun 2025
SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement
SimpleDoc: Multi-Modal Document Understanding with Dual-Cue Page Retrieval and Iterative Refinement
Chelsi Jain
Yiran Wu
Yifan Zeng
Jiale Liu
S hengyu Dai
Zhenwen Shao
Qingyun Wu
Huazheng Wang
203
7
0
16 Jun 2025
EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction
EMLoC: Emulator-based Memory-efficient Fine-tuning with LoRA Correction
Hsi-Che Lin
Yu-Chu Yu
Kai-Po Chang
Y. Wang
262
0
0
13 Jun 2025
Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs
Manager: Aggregating Insights from Unimodal Experts in Two-Tower VLMs and MLLMs
Xiao Xu
L. Qin
Wanxiang Che
Min-Yen Kan
MoEVLM
308
0
0
13 Jun 2025
VLM@school -- Evaluation of AI image understanding on German middle school knowledge
VLM@school -- Evaluation of AI image understanding on German middle school knowledge
René Peinl
Vincent Tischler
CoGeVLM
262
0
0
13 Jun 2025
Previous
12345...141516
Next