ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2207.11871
  4. Cited By
Towards Complex Document Understanding By Discrete Reasoning

Towards Complex Document Understanding By Discrete Reasoning

25 July 2022
Fengbin Zhu
Wenqiang Lei
Fuli Feng
Chao Wang
Haozhou Zhang
Tat-Seng Chua
ArXivPDFHTML

Papers citing "Towards Complex Document Understanding By Discrete Reasoning"

37 / 37 papers shown
Title
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?
M-DocSum: Do LVLMs Genuinely Comprehend Interleaved Image-Text in Document Summarization?
Haolong Yan
Kaijun Tan
Yeqing Shen
Xin Huang
Zheng Ge
Xiangyu Zhang
Si Li
Daxin Jiang
VLM
40
0
0
27 Mar 2025
PP-DocBee: Improving Multimodal Document Understanding Through a Bag of Tricks
Feng Ni
Kui Huang
Yao Lu
Wenyu Lv
Guanzhong Wang
Zeyu Chen
Y. Liu
VLM
42
0
0
06 Mar 2025
OkraLong: A Flexible Retrieval-Augmented Framework for Long-Text Query Processing
Yulong Hui
Y. Liu
Yao Lu
Huanchen Zhang
RALM
125
0
0
04 Mar 2025
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
Pei Fu
Tongkun Guan
Zining Wang
Zhentao Guo
Chen Duan
...
Boming Chen
Jiayao Ma
Qianyi Jiang
Kai Zhou
Junfeng Luo
VLM
53
0
0
23 Feb 2025
REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark
REAL-MM-RAG: A Real-World Multi-Modal Retrieval Benchmark
Navve Wasserman
Roi Pony
O. Naparstek
Adi Raz Goldfarb
Eli Schwartz
Udi Barzelay
Leonid Karlinsky
3DV
VLM
70
1
0
17 Feb 2025
Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval
Any Information Is Just Worth One Single Screenshot: Unifying Search With Visualized Information Retrieval
Ze Liu
Zhengyang Liang
Junjie Zhou
Zheng Liu
Defu Lian
OffRL
67
0
0
17 Feb 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
J. Liu
Tao Zhang
Tao Zhang
S. Chen
...
Jianhua Xu
Haoze Sun
Mingan Lin
Zenan Zhou
Weipeng Chen
AuLLM
70
10
0
28 Jan 2025
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
Andreas Koukounas
Georgios Mastrapas
Bo Wang
Mohammad Kalim Akram
Sedigheh Eslami
Michael Gunther
Isabelle Mohr
Saba Sturua
Scott Martens
Nan Wang
VLM
103
6
0
11 Dec 2024
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained
  Visual Document Understanding
MMDocBench: Benchmarking Large Vision-Language Models for Fine-Grained Visual Document Understanding
Fengbin Zhu
Ziyang Liu
Xiang Yao Ng
Haohui Wu
W. Wang
Fuli Feng
Chao Wang
Huanbo Luan
Tat-Seng Chua
VLM
35
3
0
25 Oct 2024
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks
Mengzhao Jia
Wenhao Yu
Kaixin Ma
Tianqing Fang
Zhihan Zhang
Siru Ouyang
Hongming Zhang
Meng-Long Jiang
Dong Yu
VLM
29
5
0
02 Oct 2024
NVLM: Open Frontier-Class Multimodal LLMs
NVLM: Open Frontier-Class Multimodal LLMs
Wenliang Dai
Nayeon Lee
Boxin Wang
Zhuoling Yang
Zihan Liu
Jon Barker
Tuomas Rintamaki
M. Shoeybi
Bryan Catanzaro
Wei Ping
MLLM
VLM
LRM
40
51
0
17 Sep 2024
WebQuest: A Benchmark for Multimodal QA on Web Page Sequences
WebQuest: A Benchmark for Multimodal QA on Web Page Sequences
Maria Wang
Srinivas Sunkara
Gilles Baechler
Jason Lin
Yun Zhu
Fedir Zubach
Lei Shu
Jindong Chen
LRM
LLMAG
18
1
0
06 Sep 2024
Arctic-TILT. Business Document Understanding at Sub-Billion Scale
Arctic-TILT. Business Document Understanding at Sub-Billion Scale
Łukasz Borchmann
Michał Pietruszka
Wojciech Ja'skowski
Dawid Jurkiewicz
Piotr Halama
...
Gabriela Nowakowska
Artur Zawłocki
Łukasz Duhr
Paweł Dyda
Michał Turski
VLM
34
1
0
08 Aug 2024
Deep Learning based Visually Rich Document Content Understanding: A
  Survey
Deep Learning based Visually Rich Document Content Understanding: A Survey
Muhammad Ali
Jean Lee
Salman Khan
29
6
0
02 Aug 2024
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large
  Vision-Language Models
HiRes-LLaVA: Restoring Fragmentation Input in High-Resolution Large Vision-Language Models
Runhui Huang
Xinpeng Ding
Chunwei Wang
J. N. Han
Yulong Liu
Hengshuang Zhao
Hang Xu
Lu Hou
Wei Zhang
Xiaodan Liang
VLM
23
8
0
11 Jul 2024
MMLongBench-Doc: Benchmarking Long-context Document Understanding with
  Visualizations
MMLongBench-Doc: Benchmarking Long-context Document Understanding with Visualizations
Yubo Ma
Yuhang Zang
Liangyu Chen
Meiqi Chen
Yizhu Jiao
...
Liangming Pan
Yu-Gang Jiang
Jiaqi Wang
Yixin Cao
Aixin Sun
ELM
RALM
VLM
26
23
0
01 Jul 2024
ColPali: Efficient Document Retrieval with Vision Language Models
ColPali: Efficient Document Retrieval with Vision Language Models
Manuel Faysse
Hugues Sibille
Tony Wu
Bilel Omrani
Gautier Viaud
C´eline Hudelot
Pierre Colombo
VLM
60
21
0
27 Jun 2024
UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world
  Document Analysis
UDA: A Benchmark Suite for Retrieval Augmented Generation in Real-world Document Analysis
Yulong Hui
Yao Lu
Huanchen Zhang
RALM
33
9
0
21 Jun 2024
RoDLA: Benchmarking the Robustness of Document Layout Analysis Models
RoDLA: Benchmarking the Robustness of Document Layout Analysis Models
Yufan Chen
Jiaming Zhang
Kunyu Peng
Junwei Zheng
Ruiping Liu
Philip H. S. Torr
Rainer Stiefelhagen
OOD
29
5
0
21 Mar 2024
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question
  Answering and Clinical Reasoning
RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning
Congyun Jin
Ming Zhang
Xiaowei Ma
Yujiao Li
Yingbo Wang
...
Chenfei Chi
Xiangguo Lv
Fangzhou Li
Wei Xue
Yiran Huang
LM&MA
25
2
0
19 Feb 2024
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document
  Understanding with Instructions
InstructDoc: A Dataset for Zero-Shot Generalization of Visual Document Understanding with Instructions
Ryota Tanaka
Taichi Iki
Kyosuke Nishida
Kuniko Saito
Jun Suzuki
VLM
11
23
0
24 Jan 2024
TAT-LLM: A Specialized Language Model for Discrete Reasoning over
  Tabular and Textual Data
TAT-LLM: A Specialized Language Model for Discrete Reasoning over Tabular and Textual Data
Fengbin Zhu
Ziyang Liu
Fuli Feng
Chao Wang
Moxin Li
Tat-Seng Chua
LMTD
LRM
14
15
0
24 Jan 2024
Beyond Document Page Classification: Design, Datasets, and Challenges
Beyond Document Page Classification: Design, Datasets, and Challenges
Jordy Van Landeghem
Sanket Biswas
Matthew B. Blaschko
Marie-Francine Moens
27
6
0
24 Aug 2023
RealCQA: Scientific Chart Question Answering as a Test-bed for
  First-Order Logic
RealCQA: Scientific Chart Question Answering as a Test-bed for First-Order Logic
Saleem Ahmed
Bhavin Jawade
Shubham Pandey
S. Setlur
Venugopal Govindaraju
13
5
0
03 Aug 2023
Document Understanding Dataset and Evaluation (DUDE)
Document Understanding Dataset and Evaluation (DUDE)
Jordy Van Landeghem
Rubèn Pérez Tito
Łukasz Borchmann
Michal Pietruszka
Pawel Józiak
...
Bertrand Ackaert
Ernest Valveny
Matthew Blaschko
Sien Moens
Tomasz Stanislawek
VGen
14
52
0
15 May 2023
Visual Information Extraction in the Wild: Practical Dataset and
  End-to-end Solution
Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution
Jianfeng Kuang
Wei Hua
Dingkang Liang
Mingkun Yang
Deqiang Jiang
Bo Ren
Xiang Bai
25
39
0
12 May 2023
Multi-View Graph Representation Learning for Answering Hybrid Numerical
  Reasoning Question
Multi-View Graph Representation Learning for Answering Hybrid Numerical Reasoning Question
Yifan Wei
Fangyu Lei
Yuanzhe Zhang
Jun Zhao
Kang Liu
AIMat
8
10
0
05 May 2023
Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text
  Documents via Semantic-Oriented Hierarchical Graphs
Doc2SoarGraph: Discrete Reasoning over Visually-Rich Table-Text Documents via Semantic-Oriented Hierarchical Graphs
Fengbin Zhu
Chao Wang
Fuli Feng
Zifeng Ren
Moxin Li
Tat-Seng Chua
32
3
0
03 May 2023
A Survey on Table-and-Text HybridQA: Concepts, Methods, Challenges and
  Future Directions
A Survey on Table-and-Text HybridQA: Concepts, Methods, Challenges and Future Directions
Dingzirui Wang
Longxu Dou
Wanxiang Che
14
5
0
27 Dec 2022
Hierarchical multimodal transformers for Multi-Page DocVQA
Hierarchical multimodal transformers for Multi-Page DocVQA
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
11
54
0
07 Dec 2022
NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual
  Question Answering
NAPG: Non-Autoregressive Program Generation for Hybrid Tabular-Textual Question Answering
Tengxun Zhang
Hongfei Xu
Josef van Genabith
Deyi Xiong
Hongying Zan
AIMat
LRM
16
5
0
07 Nov 2022
PACIFIC: Towards Proactive Conversational Question Answering over
  Tabular and Textual Data in Finance
PACIFIC: Towards Proactive Conversational Question Answering over Tabular and Textual Data in Finance
Yang Deng
Wenqiang Lei
Wenxuan Zhang
W. Lam
Tat-Seng Chua
36
51
0
17 Oct 2022
Answering Numerical Reasoning Questions in Table-Text Hybrid Contents
  with Graph-based Encoder and Tree-based Decoder
Answering Numerical Reasoning Questions in Table-Text Hybrid Contents with Graph-based Encoder and Tree-based Decoder
Fangyu Lei
Shizhu He
Xiang Li
Jun Zhao
Kang Liu
AIMat
LMTD
8
23
0
16 Sep 2022
PubTables-1M: Towards comprehensive table extraction from unstructured
  documents
PubTables-1M: Towards comprehensive table extraction from unstructured documents
B. Smock
Rohith Pesala
Robin Abraham
LMTD
27
96
0
30 Sep 2021
Retrieving and Reading: A Comprehensive Survey on Open-domain Question
  Answering
Retrieving and Reading: A Comprehensive Survey on Open-domain Question Answering
Fengbin Zhu
Wenqiang Lei
Chao Wang
Jianming Zheng
Soujanya Poria
Tat-Seng Chua
RALM
208
251
0
04 Jan 2021
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document
  Understanding
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
...
D. Florêncio
Cha Zhang
Wanxiang Che
Min Zhang
Lidong Zhou
ViT
MLLM
145
498
0
29 Dec 2020
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents
Guillaume Jaume
H. K. Ekenel
Jean-Philippe Thiran
122
353
0
27 May 2019
1