Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2007.00398
Cited By
v1
v2
v3 (latest)
DocVQA: A Dataset for VQA on Document Images
1 July 2020
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (2 upvotes)
Papers citing
"DocVQA: A Dataset for VQA on Document Images"
50 / 759 papers shown
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap
IEEE International Conference on Computer Vision (ICCV), 2023
Daehee Kim
Yoon Kim
Donghyun Kim
Yumin Lim
Geewook Kim
Taeho Kil
279
4
0
21 Sep 2023
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLM
MLLM
260
89
0
20 Sep 2023
PDFTriage: Question Answering over Long, Structured Documents
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Jon Saad-Falcon
Joe Barrow
Alexa F. Siu
A. Nenkova
David Seunghyun Yoon
Ryan Rossi
Franck Dernoncourt
RALM
264
29
0
16 Sep 2023
Long-Range Transformer Architectures for Document Understanding
Thibault Douzon
S. Duffner
Christophe Garcia
Jérémy Espinas
VLM
177
3
0
11 Sep 2023
ImageBind-LLM: Multi-modality Instruction Tuning
Jiaming Han
Renrui Zhang
Wenqi Shao
Shiyang Feng
Peng Xu
...
Yafei Wen
Xiaoxin Chen
Xiangyu Yue
Jiaming Song
Yu Qiao
MLLM
276
152
0
07 Sep 2023
Understanding Video Scenes through Text: Insights from Text-based Video Question Answering
Soumya Jahagirdar
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
194
2
0
04 Sep 2023
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
IEEE International Conference on Computer Vision (ICCV), 2023
H. Cao
Changcun Bao
Chaohu Liu
Huang-wei Chen
Kun Yin
Hao Liu
Yinsong Liu
Deqiang Jiang
Xing Sun
200
16
0
03 Sep 2023
Distraction-free Embeddings for Robust VQA
Atharvan Dogra
Deeksha Varshney
Ashwin Kalyan
Ameet Deshpande
Neeraj Kumar
198
0
0
31 Aug 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLM
VLM
ObjD
513
1,565
0
24 Aug 2023
InstructionGPT-4: A 200-Instruction Paradigm for Fine-Tuning MiniGPT-4
Lai Wei
Zihao Jiang
Weiran Huang
Lichao Sun
VLM
MLLM
296
74
0
23 Aug 2023
Knowledge Graph Prompting for Multi-Document Question Answering
AAAI Conference on Artificial Intelligence (AAAI), 2023
Yu Wang
Nedim Lipka
Ryan Rossi
Alexa F. Siu
Ruiyi Zhang
Hanyu Wang
RALM
514
225
0
22 Aug 2023
DocPrompt: Large-scale continue pretrain for zero-shot and few-shot document question answering
Sijin Wu
Dan Zhang
Teng Hu
Shikun Feng
89
1
0
21 Aug 2023
BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions
AAAI Conference on Artificial Intelligence (AAAI), 2023
Wenbo Hu
Y. Xu
Jian Wang
W. Li
Zhe Chen
Zhuowen Tu
MLLM
VLM
347
188
0
19 Aug 2023
Enhancing Visually-Rich Document Understanding via Layout Structure Modeling
ACM Multimedia (ACM MM), 2023
Qiwei Li
Z. Li
Xiantao Cai
Bo Du
Hai Zhao
147
11
0
15 Aug 2023
Fine-tuning Multimodal LLMs to Follow Zero-shot Demonstrative Instructions
International Conference on Learning Representations (ICLR), 2023
Juncheng Li
Kaihang Pan
Zhiqi Ge
Minghe Gao
Wei Ji
Wenqiao Zhang
Tat-Seng Chua
Siliang Tang
Hanwang Zhang
Yueting Zhuang
MLLM
312
89
0
08 Aug 2023
Tiny LVLM-eHub: Early Multimodal Experiments with Bard
IEEE Transactions on Big Data (IEEE Trans. Big Data), 2023
Wenqi Shao
Yutao Hu
Shiyang Feng
Meng Lei
Kaipeng Zhang
...
Peng Xu
Siyuan Huang
Jiaming Song
Yuning Qiao
Ping Luo
VLM
MLLM
207
24
0
07 Aug 2023
RealCQA: Scientific Chart Question Answering as a Test-bed for First-Order Logic
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
Saleem Ahmed
Bhavin Jawade
Shubham Pandey
S. Setlur
Venugopal Govindaraju
154
7
0
03 Aug 2023
Making the V in Text-VQA Matter
Shamanthak Hegde
Soumya Jahagirdar
Shankar Gangisetty
CoGe
181
4
0
01 Aug 2023
HiREN: Towards Higher Supervision Quality for Better Scene Text Image Super-Resolution
Minyi Zhao
Yi Xu
Bingjia Li
Jie Wang
Jihong Guan
Shuigeng Zhou
261
2
0
31 Jul 2023
Workshop on Document Intelligence Understanding
International Conference on Information and Knowledge Management (CIKM), 2023
S. Han
Yihao Ding
Siwen Luo
J. Poon
HeeGuen Yoon
Zhe Huang
P. Duuring
E. Holden
117
1
0
31 Jul 2023
ChatSpot: Bootstrapping Multimodal LLMs via Precise Referring Instruction Tuning
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Liang Zhao
En Yu
Zheng Ge
Jinrong Yang
Hao-Ran Wei
...
Jian‐Yuan Sun
Yuang Peng
Runpei Dong
Chunrui Han
Xiangyu Zhang
MLLM
LRM
169
69
0
18 Jul 2023
PAT: Parallel Attention Transformer for Visual Question Answering in Vietnamese
International Conference on Multimedia Analysis and Pattern Recognition (ICMAPR), 2023
Nghia Hieu Nguyen
Kiet Van Nguyen
208
2
0
17 Jul 2023
Reading Between the Lanes: Text VideoQA on the Road
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
George Tom
Minesh Mathew
Sergi Garcia
Dimosthenis Karatzas
C. V. Jawahar
273
20
0
08 Jul 2023
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Jiabo Ye
Anwen Hu
Haiyang Xu
Qinghao Ye
Mingshi Yan
...
Chenliang Li
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
VLM
MLLM
225
155
0
04 Jul 2023
LLaVAR: Enhanced Visual Instruction Tuning for Text-Rich Image Understanding
Yanzhe Zhang
Ruiyi Zhang
Jiuxiang Gu
Jiuxiang Gu
Nedim Lipka
Diyi Yang
Tongfei Sun
VLM
MLLM
316
283
0
29 Jun 2023
Natural Language Generation for Advertising: A Survey
Soichiro Murakami
Sho Hoshino
Peinan Zhang
185
15
0
22 Jun 2023
On Evaluation of Document Classification using RVL-CDIP
Stefan Larson
Gordon Lim
Kevin Leach
262
4
0
21 Jun 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Peng Xu
Wenqi Shao
Kaipeng Zhang
Shiyang Feng
Shuo Liu
Meng Lei
Fanqing Meng
Siyuan Huang
Yu Qiao
Ping Luo
ELM
MLLM
309
230
0
15 Jun 2023
DocumentNet: Bridging the Data Gap in Document Pre-Training
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Lijun Yu
Jin Miao
Xiaoyu Sun
Jiayi Chen
Alexander G. Hauptmann
H. Dai
Wei Wei
97
3
0
15 Jun 2023
M
3
^3
3
IT: A Large-Scale Dataset towards Multi-Modal Multilingual Instruction Tuning
Lei Li
Yuwei Yin
Shicheng Li
Liang Chen
Peiyi Wang
...
Yazheng Yang
Jingjing Xu
Xu Sun
Lingpeng Kong
Qi Liu
MLLM
VLM
376
136
0
07 Jun 2023
Do-GOOD: Towards Distribution Shift Evaluation for Pre-Trained Visual Document Understanding Models
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2023
Jiabang He
Yilang Hu
Lei Wang
Xingdong Xu
Ning Liu
Hui-juan Liu
Hengtao Shen
VLM
OOD
162
4
0
05 Jun 2023
DocFormerv2: Local Features for Document Understanding
AAAI Conference on Artificial Intelligence (AAAI), 2023
Srikar Appalaraju
Peng Tang
Qi Dong
Nishant Sankaran
Yichu Zhou
R. Manmatha
247
57
0
02 Jun 2023
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
Wenjin Wang
Yunhao Li
Yixin Ou
Yin Zhang
VLM
403
34
0
01 Jun 2023
PaLI-X: On Scaling up a Multilingual Vision and Language Model
Xi Chen
Josip Djolonga
Piotr Padlewski
Basil Mustafa
Soravit Changpinyo
...
Mojtaba Seyedhosseini
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
VLM
334
252
0
29 May 2023
DAPR: A Benchmark on Document-Aware Passage Retrieval
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Kexin Wang
Nils Reimers
Iryna Gurevych
351
9
0
23 May 2023
Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding
ShuWei Feng
Tianyang Zhan
Zhanming Jie
Trung Quoc Luong
Xiaoran Jin
105
2
0
16 May 2023
On the Hidden Mystery of OCR in Large Multimodal Models
Science China Information Sciences (Sci China Inf Sci), 2023
Yuliang Liu
Zhang Li
Mingxin Huang
Chunyuan Li
Dezhi Peng
Mingyu Liu
Lianwen Jin
Xiang Bai
VLM
MLLM
405
117
0
13 May 2023
OpenViVQA: Task, Dataset, and Multimodal Fusion Models for Visual Question Answering in Vietnamese
Information Fusion (Inf. Fusion), 2023
Nghia Hieu Nguyen
Duong T.D. Vo
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
194
27
0
07 May 2023
Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text Documents via Semantic-Oriented Hierarchical Graphs
International Conference on Language Resources and Evaluation (LREC), 2023
Fengbin Zhu
Chao Wang
Fuli Feng
Zifeng Ren
Moxin Li
Tat-Seng Chua
217
7
0
03 May 2023
SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
Subhajit Maity
Sanket Biswas
Siladittya Manna
Ayan Banerjee
Josep Lladós
Saumik Bhattacharya
Umapada Pal
176
10
0
01 May 2023
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Shiyang Feng
Jiaming Han
Renrui Zhang
Ziyi Lin
Shijie Geng
...
Pan Lu
Conghui He
Xiangyu Yue
Jiaming Song
Yu Qiao
MLLM
288
704
0
28 Apr 2023
Information Redundancy and Biases in Public Document Information Extraction Benchmarks
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
S. Laatiri
Pirashanth Ratnamogan
Joel Tang
Laurent Lam
William Vanhuffel
Fabien Caspani
144
2
0
28 Apr 2023
MPMQA: Multimodal Question Answering on Product Manuals
AAAI Conference on Artificial Intelligence (AAAI), 2023
Liangfu Zhang
Anwen Hu
Jing Zhang
Shuo Hu
Qin Jin
192
14
0
19 Apr 2023
Deep Unrestricted Document Image Rectification
IEEE transactions on multimedia (IEEE TMM), 2023
Hao Feng
Shaokai Liu
Jiajun Deng
Wen-gang Zhou
Houqiang Li
ViT
298
24
0
18 Apr 2023
A Question-Answering Approach to Key Value Pair Extraction from Form-like Document Images
AAAI Conference on Artificial Intelligence (AAAI), 2023
Kai Hu
Zhuoyuan Wu
Zhuoyao Zhong
Weihong Lin
Lei-huan Sun
Qiang Huo
199
14
0
17 Apr 2023
PDFVQA: A New Dataset for Real-World VQA on PDF Documents
Yihao Ding
Siwen Luo
Hyunsuk Chung
S. Han
402
25
0
13 Apr 2023
Conditional Adapters: Parameter-efficient Transfer Learning with Fast Inference
Neural Information Processing Systems (NeurIPS), 2023
Tao Lei
Junwen Bai
Siddhartha Brahma
Joshua Ainslie
Kenton Lee
...
Vincent Zhao
Yuexin Wu
Yue Liu
Yu Zhang
Ming-Wei Chang
BDL
AI4CE
220
80
0
11 Apr 2023
Efficient OCR for Building a Diverse Digital History
Annual Meeting of the Association for Computational Linguistics (ACL), 2023
Jacob Carlson
Tom Bryan
Melissa Dell
237
14
0
05 Apr 2023
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Renrui Zhang
Jiaming Han
Chris Liu
Shiyang Feng
Aojun Zhou
Xiangfei Hu
Shilin Yan
Pan Lu
Jiaming Song
Yu Qiao
MLLM
584
938
0
28 Mar 2023
TabIQA: Table Questions Answering on Business Document Images
Phuc Nguyen
N. Ly
Hideaki Takeda
Atsuhiro Takasu
LMTD
212
2
0
27 Mar 2023
Previous
1
2
3
...
13
14
15
16
Next