Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.04883
Cited By
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images
12 January 2023
Ryota Tanaka
Kyosuke Nishida
Kosuke Nishida
Taku Hasegawa
Itsumi Saito
Kuniko Saito
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images"
17 / 17 papers shown
Title
DocVideoQA: Towards Comprehensive Understanding of Document-Centric Videos through Question Answering
H. Wang
Kai Hu
Liangcai Gao
144
0
0
20 Mar 2025
LiGT: Layout-infused Generative Transformer for Visual Question Answering on Vietnamese Receipts
Thanh-Phong Le
Trung Le Chi Phan
Nghia Hieu Nguyen
Kiet Van Nguyen
ViT
44
0
0
26 Feb 2025
Quantifying Memorization and Retriever Performance in Retrieval-Augmented Vision-Language Models
Peter Carragher
Abhinand Jha
R Raghav
Kathleen M. Carley
RALM
75
0
0
20 Feb 2025
Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval
Ze Liu
Zhengyang Liang
Junjie Zhou
Zheng Liu
Defu Lian
OffRL
97
0
0
17 Feb 2025
REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark
Navve Wasserman
Roi Pony
O. Naparstek
Adi Raz Goldfarb
Eli Schwartz
Udi Barzelay
Leonid Karlinsky
3DV
VLM
82
1
0
17 Feb 2025
PixelWorld: Towards Perceiving Everything as Pixels
Zhiheng Lyu
Xueguang Ma
Wenhu Chen
143
0
0
31 Jan 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
J. Liu
Tao Zhang
Tao Zhang
S. Chen
...
Jianhua Xu
Haoze Sun
Mingan Lin
Zenan Zhou
Weipeng Chen
AuLLM
72
10
0
28 Jan 2025
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Manan Suri
Puneet Mathur
Franck Dernoncourt
Kanika Goswami
Ryan Rossi
Dinesh Manocha
95
3
0
14 Dec 2024
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents
S. Yu
C. Tang
Bokai Xu
Junbo Cui
Junhao Ran
...
Zhenghao Liu
Shuo Wang
Xu Han
Zhiyuan Liu
Maosong Sun
VLM
39
22
0
14 Oct 2024
JourneyBench: A Challenging One-Stop Vision-Language Understanding Benchmark of Generated Images
Zhecan Wang
Junzhang Liu
Chia-Wei Tang
Hani Alomari
Anushka Sivakumar
...
Haoxuan You
A. Ishmam
Kai-Wei Chang
Shih-Fu Chang
Chris Thomas
CoGe
VLM
61
2
0
19 Sep 2024
SPIQA: A Dataset for Multimodal Question Answering on Scientific Papers
Shraman Pramanick
Rama Chellappa
Subhashini Venugopalan
48
13
0
12 Jul 2024
PDF-MVQA: A Dataset for Multimodal Information Retrieval in PDF-based Visual Question Answering
Yihao Ding
Kaixuan Ren
Jiabin Huang
Siwen Luo
S. Han
40
1
0
19 Apr 2024
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning
Congyun Jin
Ming Zhang
Xiaowei Ma
Yujiao Li
Yingbo Wang
...
Chenfei Chi
Xiangguo Lv
Fangzhou Li
Wei Xue
Yiran Huang
LM&MA
27
2
0
19 Feb 2024
BloomVQA: Assessing Hierarchical Multi-modal Comprehension
Yunye Gong
Robik Shrestha
Jared Claypoole
Michael Cogswell
Arijit Ray
Christopher Kanan
Ajay Divakaran
28
0
0
20 Dec 2023
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
355
8,457
0
28 Jan 2022
Turning Tables: Generating Examples from Semi-structured Tables for Endowing Language Models with Reasoning Skills
Ori Yoran
Alon Talmor
Jonathan Berant
ReLM
LRM
177
53
0
15 Jul 2021
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
...
D. Florêncio
Cha Zhang
Wanxiang Che
Min Zhang
Lidong Zhou
ViT
MLLM
147
498
0
29 Dec 2020
1