ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.00398
  4. Cited By
DocVQA: A Dataset for VQA on Document Images
v1v2v3 (latest)

DocVQA: A Dataset for VQA on Document Images

1 July 2020
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
ArXiv (abs)PDFHTMLHuggingFace (2 upvotes)

Papers citing "DocVQA: A Dataset for VQA on Document Images"

50 / 759 papers shown
SCOB: Universal Text Understanding via Character-wise Supervised
  Contrastive Learning with Online Text Rendering for Bridging Domain Gap
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain GapIEEE International Conference on Computer Vision (ICCV), 2023
Daehee Kim
Yoon Kim
Donghyun Kim
Yumin Lim
Geewook Kim
Taeho Kil
279
4
0
21 Sep 2023
Kosmos-2.5: A Multimodal Literate Model
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLMMLLM
260
89
0
20 Sep 2023
PDFTriage: Question Answering over Long, Structured Documents
PDFTriage: Question Answering over Long, Structured DocumentsConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Jon Saad-Falcon
Joe Barrow
Alexa F. Siu
A. Nenkova
David Seunghyun Yoon
Ryan Rossi
Franck Dernoncourt
RALM
264
29
0
16 Sep 2023
Long-Range Transformer Architectures for Document Understanding
Long-Range Transformer Architectures for Document Understanding
Thibault Douzon
S. Duffner
Christophe Garcia
Jérémy Espinas
VLM
177
3
0
11 Sep 2023
ImageBind-LLM: Multi-modality Instruction Tuning
ImageBind-LLM: Multi-modality Instruction Tuning
Jiaming Han
Renrui Zhang
Wenqi Shao
Shiyang Feng
Peng Xu
...
Yafei Wen
Xiaoxin Chen
Xiangyu Yue
Jiaming Song
Yu Qiao
MLLM
276
152
0
07 Sep 2023
Understanding Video Scenes through Text: Insights from Text-based Video
  Question Answering
Understanding Video Scenes through Text: Insights from Text-based Video Question Answering
Soumya Jahagirdar
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
194
2
0
04 Sep 2023
Attention Where It Matters: Rethinking Visual Document Understanding
  with Selective Region Concentration
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region ConcentrationIEEE International Conference on Computer Vision (ICCV), 2023
H. Cao
Changcun Bao
Chaohu Liu
Huang-wei Chen
Kun Yin
Hao Liu
Yinsong Liu
Deqiang Jiang
Xing Sun
200
16
0
03 Sep 2023
Distraction-free Embeddings for Robust VQA
Distraction-free Embeddings for Robust VQA
Atharvan Dogra
Deeksha Varshney
Ashwin Kalyan
Ameet Deshpande
Neeraj Kumar
198
0
0
31 Aug 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding,
  Localization, Text Reading, and Beyond
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLMVLMObjD
513
1,565
0
24 Aug 2023
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4
Lai Wei
Zihao Jiang
Weiran Huang
Lichao Sun
VLMMLLM
296
74
0
23 Aug 2023
Knowledge Graph Prompting for Multi-Document Question Answering
Knowledge Graph Prompting for Multi-Document Question AnsweringAAAI Conference on Artificial Intelligence (AAAI), 2023
Yu Wang
Nedim Lipka
Ryan Rossi
Alexa F. Siu
Ruiyi Zhang
Hanyu Wang
RALM
514
225
0
22 Aug 2023
DocPrompt: Large-scale continue pretrain for zero-shot and few-shot
  document question answering
DocPrompt: Large-scale continue pretrain for zero-shot and few-shot document question answering
Sijin Wu
Dan Zhang
Teng Hu
Shikun Feng
89
1
0
21 Aug 2023
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual
  Questions
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual QuestionsAAAI Conference on Artificial Intelligence (AAAI), 2023
Wenbo Hu
Y. Xu
Jian Wang
W. Li
Zhe Chen
Zhuowen Tu
MLLMVLM
347
188
0
19 Aug 2023
Enhancing Visually-Rich Document Understanding via Layout Structure
  Modeling
Enhancing Visually-Rich Document Understanding via Layout Structure ModelingACM Multimedia (ACM MM), 2023
Qiwei Li
Z. Li
Xiantao Cai
Bo Du
Hai Zhao
147
11
0
15 Aug 2023
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative
  Instructions
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative InstructionsInternational Conference on Learning Representations (ICLR), 2023
Juncheng Li
Kaihang Pan
Zhiqi Ge
Minghe Gao
Wei Ji
Wenqiao Zhang
Tat-Seng Chua
Siliang Tang
Hanwang Zhang
Yueting Zhuang
MLLM
312
89
0
08 Aug 2023
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
Tiny LVLM-eHub: Early Multimodal Experiments with BardIEEE Transactions on Big Data (IEEE Trans. Big Data), 2023
Wenqi Shao
Yutao Hu
Shiyang Feng
Meng Lei
Kaipeng Zhang
...
Peng Xu
Siyuan Huang
Jiaming Song
Yuning Qiao
Ping Luo
VLMMLLM
207
24
0
07 Aug 2023
RealCQA: Scientific Chart Question Answering as a Test-bed for
  First-Order Logic
RealCQA: Scientific Chart Question Answering as a Test-bed for First-Order LogicIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
Saleem Ahmed
Bhavin Jawade
Shubham Pandey
S. Setlur
Venugopal Govindaraju
154
7
0
03 Aug 2023
Making the V in Text-VQA Matter
Making the V in Text-VQA Matter
Shamanthak Hegde
Soumya Jahagirdar
Shankar Gangisetty
CoGe
181
4
0
01 Aug 2023
HiREN: Towards Higher Supervision Quality for Better Scene Text Image
  Super-Resolution
HiREN: Towards Higher Supervision Quality for Better Scene Text Image Super-Resolution
Minyi Zhao
Yi Xu
Bingjia Li
Jie Wang
Jihong Guan
Shuigeng Zhou
261
2
0
31 Jul 2023
Workshop on Document Intelligence Understanding
Workshop on Document Intelligence UnderstandingInternational Conference on Information and Knowledge Management (CIKM), 2023
S. Han
Yihao Ding
Siwen Luo
J. Poon
HeeGuen Yoon
Zhe Huang
P. Duuring
E. Holden
117
1
0
31 Jul 2023
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring
  Instruction Tuning
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction TuningInternational Joint Conference on Artificial Intelligence (IJCAI), 2023
Liang Zhao
En Yu
Zheng Ge
Jinrong Yang
Hao-Ran Wei
...
Jian‐Yuan Sun
Yuang Peng
Runpei Dong
Chunrui Han
Xiangyu Zhang
MLLMLRM
169
69
0
18 Jul 2023
PAT: Parallel Attention Transformer for Visual Question Answering in
  Vietnamese
PAT: Parallel Attention Transformer for Visual Question Answering in VietnameseInternational Conference on Multimedia Analysis and Pattern Recognition (ICMAPR), 2023
Nghia Hieu Nguyen
Kiet Van Nguyen
208
2
0
17 Jul 2023
Reading Between the Lanes: Text VideoQA on the Road
Reading Between the Lanes: Text VideoQA on the RoadIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
George Tom
Minesh Mathew
Sergi Garcia
Dimosthenis Karatzas
C. V. Jawahar
273
20
0
08 Jul 2023
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document
  Understanding
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Jiabo Ye
Anwen Hu
Haiyang Xu
Qinghao Ye
Mingshi Yan
...
Chenliang Li
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
VLMMLLM
225
155
0
04 Jul 2023
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image
  Understanding
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Yanzhe Zhang
Ruiyi Zhang
Jiuxiang Gu
Jiuxiang Gu
Nedim Lipka
Diyi Yang
Tongfei Sun
VLMMLLM
316
283
0
29 Jun 2023
Natural Language Generation for Advertising: A Survey
Natural Language Generation for Advertising: A Survey
Soichiro Murakami
Sho Hoshino
Peinan Zhang
185
15
0
22 Jun 2023
On Evaluation of Document Classification using RVL-CDIP
On Evaluation of Document Classification using RVL-CDIP
Stefan Larson
Gordon Lim
Kevin Leach
262
4
0
21 Jun 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large
  Vision-Language Models
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language ModelsIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Peng Xu
Wenqi Shao
Kaipeng Zhang
Shiyang Feng
Shuo Liu
Meng Lei
Fanqing Meng
Siyuan Huang
Yu Qiao
Ping Luo
ELMMLLM
309
230
0
15 Jun 2023
DocumentNet: Bridging the Data Gap in Document Pre-Training
DocumentNet: Bridging the Data Gap in Document Pre-TrainingConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Lijun Yu
Jin Miao
Xiaoyu Sun
Jiayi Chen
Alexander G. Hauptmann
H. Dai
Wei Wei
97
3
0
15 Jun 2023
M$^3$IT: A Large-Scale Dataset towards Multi-Modal Multilingual
  Instruction Tuning
M3^33IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning
Lei Li
Yuwei Yin
Shicheng Li
Liang Chen
Peiyi Wang
...
Yazheng Yang
Jingjing Xu
Xu Sun
Lingpeng Kong
Qi Liu
MLLMVLM
376
136
0
07 Jun 2023
Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual
  Document Understanding Models
Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding ModelsAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023
Jiabang He
Yilang Hu
Lei Wang
Xingdong Xu
Ning Liu
Hui-juan Liu
Hengtao Shen
VLMOOD
162
4
0
05 Jun 2023
DocFormerv2: Local Features for Document Understanding
DocFormerv2: Local Features for Document UnderstandingAAAI Conference on Artificial Intelligence (AAAI), 2023
Srikar Appalaraju
Peng Tang
Qi Dong
Nishant Sankaran
Yichu Zhou
R. Manmatha
247
57
0
02 Jun 2023
Layout and Task Aware Instruction Prompt for Zero-shot Document Image
  Question Answering
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
Wenjin Wang
Yunhao Li
Yixin Ou
Yin Zhang
VLM
403
34
0
01 Jun 2023
PaLI-X: On Scaling up a Multilingual Vision and Language Model
PaLI-X: On Scaling up a Multilingual Vision and Language Model
Xi Chen
Josip Djolonga
Piotr Padlewski
Basil Mustafa
Soravit Changpinyo
...
Mojtaba Seyedhosseini
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
VLM
334
252
0
29 May 2023
DAPR: A Benchmark on Document-Aware Passage Retrieval
DAPR: A Benchmark on Document-Aware Passage RetrievalAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Kexin Wang
Nils Reimers
Iryna Gurevych
351
9
0
23 May 2023
Sequence-to-Sequence Pre-training with Unified Modality Masking for
  Visual Document Understanding
Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding
ShuWei Feng
Tianyang Zhan
Zhanming Jie
Trung Quoc Luong
Xiaoran Jin
105
2
0
16 May 2023
On the Hidden Mystery of OCR in Large Multimodal Models
On the Hidden Mystery of OCR in Large Multimodal ModelsScience China Information Sciences (Sci China Inf Sci), 2023
Yuliang Liu
Zhang Li
Mingxin Huang
Chunyuan Li
Dezhi Peng
Mingyu Liu
Lianwen Jin
Xiang Bai
VLMMLLM
405
117
0
13 May 2023
OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual
  Question Answering in Vietnamese
OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in VietnameseInformation Fusion (Inf. Fusion), 2023
Nghia Hieu Nguyen
Duong T.D. Vo
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
194
27
0
07 May 2023
Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text
  Documents via Semantic-Oriented Hierarchical Graphs
Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text Documents via Semantic-Oriented Hierarchical GraphsInternational Conference on Language Resources and Evaluation (LREC), 2023
Fengbin Zhu
Chao Wang
Fuli Feng
Zifeng Ren
Moxin Li
Tat-Seng Chua
217
7
0
03 May 2023
SelfDocSeg: A Self-Supervised vision-based Approach towards Document
  Segmentation
SelfDocSeg: A Self-Supervised vision-based Approach towards Document SegmentationIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
Subhajit Maity
Sanket Biswas
Siladittya Manna
Ayan Banerjee
Josep Lladós
Saumik Bhattacharya
Umapada Pal
176
10
0
01 May 2023
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Shiyang Feng
Jiaming Han
Renrui Zhang
Ziyi Lin
Shijie Geng
...
Pan Lu
Conghui He
Xiangyu Yue
Jiaming Song
Yu Qiao
MLLM
288
704
0
28 Apr 2023
Information Redundancy and Biases in Public Document Information
  Extraction Benchmarks
Information Redundancy and Biases in Public Document Information Extraction BenchmarksIEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
S. Laatiri
Pirashanth Ratnamogan
Joel Tang
Laurent Lam
William Vanhuffel
Fabien Caspani
144
2
0
28 Apr 2023
MPMQA: Multimodal Question Answering on Product Manuals
MPMQA: Multimodal Question Answering on Product ManualsAAAI Conference on Artificial Intelligence (AAAI), 2023
Liangfu Zhang
Anwen Hu
Jing Zhang
Shuo Hu
Qin Jin
192
14
0
19 Apr 2023
Deep Unrestricted Document Image Rectification
Deep Unrestricted Document Image RectificationIEEE transactions on multimedia (IEEE TMM), 2023
Hao Feng
Shaokai Liu
Jiajun Deng
Wen-gang Zhou
Houqiang Li
ViT
298
24
0
18 Apr 2023
A Question-Answering Approach to Key Value Pair Extraction from
  Form-like Document Images
A Question-Answering Approach to Key Value Pair Extraction from Form-like Document ImagesAAAI Conference on Artificial Intelligence (AAAI), 2023
Kai Hu
Zhuoyuan Wu
Zhuoyao Zhong
Weihong Lin
Lei-huan Sun
Qiang Huo
199
14
0
17 Apr 2023
PDFVQA: A New Dataset for Real-World VQA on PDF Documents
PDFVQA: A New Dataset for Real-World VQA on PDF Documents
Yihao Ding
Siwen Luo
Hyunsuk Chung
S. Han
402
25
0
13 Apr 2023
Conditional Adapters: Parameter-efficient Transfer Learning with Fast
  Inference
Conditional Adapters: Parameter-efficient Transfer Learning with Fast InferenceNeural Information Processing Systems (NeurIPS), 2023
Tao Lei
Junwen Bai
Siddhartha Brahma
Joshua Ainslie
Kenton Lee
...
Vincent Zhao
Yuexin Wu
Yue Liu
Yu Zhang
Ming-Wei Chang
BDLAI4CE
220
80
0
11 Apr 2023
Efficient OCR for Building a Diverse Digital History
Efficient OCR for Building a Diverse Digital HistoryAnnual Meeting of the Association for Computational Linguistics (ACL), 2023
Jacob Carlson
Tom Bryan
Melissa Dell
237
14
0
05 Apr 2023
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init
  Attention
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Renrui Zhang
Jiaming Han
Chris Liu
Shiyang Feng
Aojun Zhou
Xiangfei Hu
Shilin Yan
Pan Lu
Jiaming Song
Yu Qiao
MLLM
584
938
0
28 Mar 2023
TabIQA: Table Questions Answering on Business Document Images
TabIQA: Table Questions Answering on Business Document Images
Phuc Nguyen
N. Ly
Hideaki Takeda
Atsuhiro Takasu
LMTD
212
2
0
27 Mar 2023
Previous
123...13141516
Next