ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.16618
  4. Cited By
End-to-end Document Recognition and Understanding with Dessurt

End-to-end Document Recognition and Understanding with Dessurt

30 March 2022
Brian L. Davis
B. Morse
Brian L. Price
Chris Tensmeyer
Curtis Wigington
Vlad I. Morariu
    VLM
    ViT
ArXivPDFHTML

Papers citing "End-to-end Document Recognition and Understanding with Dessurt"

50 / 63 papers shown
Title
DocVXQA: Context-Aware Visual Explanations for Document Question Answering
DocVXQA: Context-Aware Visual Explanations for Document Question Answering
Mohamed Ali Souibgui
Changkyu Choi
Andrey Barsky
Kangsoo Jung
Ernest Valveny
Dimosthenis Karatzas
23
0
0
12 May 2025
CM1 - A Dataset for Evaluating Few-Shot Information Extraction with Large Vision Language Models
CM1 - A Dataset for Evaluating Few-Shot Information Extraction with Large Vision Language Models
Fabian Wolf
Oliver Tüselmann
Arthur Matei
Lukas Hennies
Christoph Rass
Gernot A. Fink
50
0
0
07 May 2025
Relation-Rich Visual Document Generator for Visual Information Extraction
Relation-Rich Visual Document Generator for Visual Information Extraction
Zi-Han Jiang
Chien-Wei Lin
Wei-Hua Li
Hsuan-Tung Liu
Yi-Ren Yeh
Chu-Song Chen
30
0
0
14 Apr 2025
Classifying the Unknown: In-Context Learning for Open-Vocabulary Text and Symbol Recognition
Classifying the Unknown: In-Context Learning for Open-Vocabulary Text and Symbol Recognition
Tom Simon
William Mocaer
Pierrick Tranouez
Clément Chatelain
Thierry Paquet
MLLM
VLM
51
0
0
09 Apr 2025
VISTA-OCR: Towards generative and interactive end to end OCR models
VISTA-OCR: Towards generative and interactive end to end OCR models
Laziz Hamdi
Amine Tamasna
Pascal Boisson
Thierry Paquet
38
0
0
04 Apr 2025
UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis
UniHDSA: A Unified Relation Prediction Approach for Hierarchical Document Structure Analysis
Jiawei Wang
Kai Hu
Qiang Huo
53
0
0
20 Mar 2025
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion
A. Nassar
Andres Marafioti
Matteo Omenetti
Maksym Lysak
Nikolaos Livathinos
...
Yusik Kim
A. Said Gurbuz
Michele Dolfi
Miquel Farré
Peter W. J. Staar
53
3
0
14 Mar 2025
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
Wenwen Yu
Zhibo Yang
Jianqiang Wan
Sibo Song
J. Tang
Wenqing Cheng
Y. Liu
Xiang Bai
46
1
0
22 Feb 2025
HIP: Hierarchical Point Modeling and Pre-training for Visual Information
  Extraction
HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction
Rujiao Long
Pengfei Wang
Zhibo Yang
Cong Yao
34
0
0
02 Nov 2024
"What is the value of {templates}?" Rethinking Document Information
  Extraction Datasets for LLMs
"What is the value of {templates}?" Rethinking Document Information Extraction Datasets for LLMs
Ran Zmigrod
Pranav Shetty
Mathieu Sibue
Zhiqiang Ma
Armineh Nourbakhsh
Xiaomo Liu
Manuela Veloso
23
0
0
20 Oct 2024
μgat: Improving Single-Page Document Parsing by Providing Multi-Page
  Context
μgat: Improving Single-Page Document Parsing by Providing Multi-Page Context
Fabio Quattrini
Carmine Zaccagnino
Silvia Cascianelli
Laura Righi
Rita Cucchiara
36
1
0
28 Aug 2024
SynthDoc: Bilingual Documents Synthesis for Visual Document
  Understanding
SynthDoc: Bilingual Documents Synthesis for Visual Document Understanding
Chuanghao Ding
Xuejing Liu
Wei Tang
Juan Li
Xiaoliang Wang
Rui Zhao
Cam-Tu Nguyen
Fei Tan
23
0
0
27 Aug 2024
Deep Learning based Visually Rich Document Content Understanding: A
  Survey
Deep Learning based Visually Rich Document Content Understanding: A Survey
Muhammad Ali
Jean Lee
Salman Khan
34
6
0
02 Aug 2024
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding
Ofir Abramovich
Niv Nayman
Sharon Fogel
I. Lavi
Ron Litman
Shahar Tsiper
Royee Tichauer
Srikar Appalaraju
Shai Mazor
R. Manmatha
VLM
33
3
0
17 Jul 2024
DANIEL: A fast Document Attention Network for Information Extraction and
  Labelling of handwritten documents
DANIEL: A fast Document Attention Network for Information Extraction and Labelling of handwritten documents
Thomas Constum
Pierrick Tranouez
Thierry Paquet
27
5
0
12 Jul 2024
Extracting Training Data from Document-Based VQA Models
Extracting Training Data from Document-Based VQA Models
Francesco Pinto
N. Rauschmayr
F. Tramèr
Philip H. S. Torr
Federico Tombari
29
3
0
11 Jul 2024
MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition
  and Analysis
MindBench: A Comprehensive Benchmark for Mind Map Structure Recognition and Analysis
Lei Chen
Feng Yan
Yujie Zhong
Shaoxiang Chen
Zequn Jie
Lin Ma
36
3
0
03 Jul 2024
DocKylin: A Large Multimodal Model for Visual Document Understanding
  with Efficient Visual Slimming
DocKylin: A Large Multimodal Model for Visual Document Understanding with Efficient Visual Slimming
Jiaxin Zhang
Wentao Yang
Songxuan Lai
Zecheng Xie
Lianwen Jin
34
15
0
27 Jun 2024
UnSupDLA: Towards Unsupervised Document Layout Analysis
UnSupDLA: Towards Unsupervised Document Layout Analysis
Talha Uddin Sheikh
Tahira Shehzadi
K. Hashmi
Didier Stricker
Muhammad Zeshan Afzal
26
2
0
10 Jun 2024
GeoContrastNet: Contrastive Key-Value Edge Learning for
  Language-Agnostic Document Understanding
GeoContrastNet: Contrastive Key-Value Edge Learning for Language-Agnostic Document Understanding
Nil Biescas
Carlos Boned Riera
Josep Lladós
Sanket Biswas
42
1
0
06 May 2024
Multi-Page Document Visual Question Answering using Self-Attention
  Scoring Mechanism
Multi-Page Document Visual Question Answering using Self-Attention Scoring Mechanism
Lei Kang
Rubèn Pérez Tito
Ernest Valveny
Dimosthenis Karatzas
31
5
0
29 Apr 2024
HRVDA: High-Resolution Visual Document Assistant
HRVDA: High-Resolution Visual Document Assistant
Chaohu Liu
Kun Yin
Haoyu Cao
Xinghua Jiang
Xin Li
Yinsong Liu
Deqiang Jiang
Xing Sun
Linli Xu
VLM
35
23
0
10 Apr 2024
LayoutLLM: Layout Instruction Tuning with Large Language Models for
  Document Understanding
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
Chuwei Luo
Yufan Shen
Zhaoqing Zhu
Qi Zheng
Zhi Yu
Cong Yao
29
38
0
08 Apr 2024
BuDDIE: A Business Document Dataset for Multi-task Information
  Extraction
BuDDIE: A Business Document Dataset for Multi-task Information Extraction
Ran Zmigrod
Dongsheng Wang
Mathieu Sibue
Yulong Pei
Petr Babkin
...
Antony Papadimitriou
William Watson
Zhiqiang Ma
Armineh Nourbakhsh
Sameena Shah
25
4
0
05 Apr 2024
OmniParser: A Unified Framework for Text Spotting, Key Information
  Extraction and Table Recognition
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
Jianqiang Wan
Sibo Song
Wenwen Yu
Yuliang Liu
Wenqing Cheng
Fei Huang
Xiang Bai
Cong Yao
Zhibo Yang
37
26
0
28 Mar 2024
Visually Guided Generative Text-Layout Pre-training for Document
  Intelligence
Visually Guided Generative Text-Layout Pre-training for Document Intelligence
Zhiming Mao
Haoli Bai
Lu Hou
Jiansheng Wei
Xin Jiang
Qun Liu
Kam-Fai Wong
32
8
0
25 Mar 2024
Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators
  for Reasoning-Based Chart VQA
Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA
Zhuowan Li
Bhavan A. Jasani
Peng Tang
Shabnam Ghadar
LRM
30
8
0
25 Mar 2024
TextMonkey: An OCR-Free Large Multimodal Model for Understanding
  Document
TextMonkey: An OCR-Free Large Multimodal Model for Understanding Document
Yuliang Liu
Biao Yang
Qiang Liu
Zhang Li
Zhiyin Ma
Shuo Zhang
Xiang Bai
MLLM
VLM
41
87
0
07 Mar 2024
Improving Language Understanding from Screenshots
Improving Language Understanding from Screenshots
Tianyu Gao
Zirui Wang
Adithya Bhaskar
Danqi Chen
VLM
27
10
0
21 Feb 2024
LAPDoc: Layout-Aware Prompting for Documents
LAPDoc: Layout-Aware Prompting for Documents
Marcel Lamott
Yves-Noel Weweler
A. Ulges
Faisal Shafait
Dirk Krechel
Darko Obradovic
46
5
0
15 Feb 2024
TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing
TreeForm: End-to-end Annotation and Evaluation for Form Document Parsing
Ran Zmigrod
Zhiqiang Ma
Armineh Nourbakhsh
Sameena Shah
24
4
0
07 Feb 2024
Hi-SAM: Marrying Segment Anything Model for Hierarchical Text
  Segmentation
Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation
Maoyuan Ye
Jing Zhang
Juhua Liu
Chenyu Liu
Baocai Yin
Cong Liu
Bo Du
Dacheng Tao
VLM
35
10
0
31 Jan 2024
Small Language Model Meets with Reinforced Vision Vocabulary
Small Language Model Meets with Reinforced Vision Vocabulary
Haoran Wei
Lingyu Kong
Jinyue Chen
Liang Zhao
Zheng Ge
En Yu
Jian‐Yuan Sun
Chunrui Han
Xiangyu Zhang
VLM
57
40
0
23 Jan 2024
CBVS: A Large-Scale Chinese Image-Text Benchmark for Real-World Short
  Video Search Scenarios
CBVS: A Large-Scale Chinese Image-Text Benchmark for Real-World Short Video Search Scenarios
Xiangshuo Qiao
Xianxin Li
Xiaozhe Qu
Jie M. Zhang
Yang Liu
Yu Luo
Cihang Jin
Jin Ma
VLM
27
0
0
19 Jan 2024
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for
  End-to-end Document Pair Extraction
PEneo: Unifying Line Extraction, Line Grouping, and Entity Linking for End-to-end Document Pair Extraction
Zening Lin
Jiapeng Wang
Teng Li
Wenhui Liao
Dayi Huang
Longfei Xiong
Lianwen Jin
19
2
0
07 Jan 2024
DocLLM: A layout-aware generative language model for multimodal document
  understanding
DocLLM: A layout-aware generative language model for multimodal document understanding
Dongsheng Wang
Natraj Raman
Mathieu Sibue
Zhiqiang Ma
Petr Babkin
Simerjot Kaur
Yulong Pei
Armineh Nourbakhsh
Xiaomo Liu
VLM
14
50
0
31 Dec 2023
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Vary: Scaling up the Vision Vocabulary for Large Vision-Language Models
Haoran Wei
Lingyu Kong
Jinyue Chen
Liang Zhao
Zheng Ge
Jinrong Yang
Jian‐Yuan Sun
Chunrui Han
Xiangyu Zhang
MLLM
VLM
66
74
0
11 Dec 2023
FATURA: A Multi-Layout Invoice Image Dataset for Document Analysis and
  Understanding
FATURA: A Multi-Layout Invoice Image Dataset for Document Analysis and Understanding
Mahmoud Limam
M. Dhiaf
Yousri Kessentini
12
2
0
20 Nov 2023
PixT3: Pixel-based Table-To-Text Generation
PixT3: Pixel-based Table-To-Text Generation
Iñigo Alonso
Eneko Agirre
Mirella Lapata
LMTD
19
5
0
16 Nov 2023
PHD: Pixel-Based Language Modeling of Historical Documents
PHD: Pixel-Based Language Modeling of Historical Documents
Nadav Borenstein
Phillip Rust
Desmond Elliott
Isabelle Augenstein
18
3
0
22 Oct 2023
UReader: Universal OCR-free Visually-situated Language Understanding
  with Multimodal Large Language Model
UReader: Universal OCR-free Visually-situated Language Understanding with Multimodal Large Language Model
Jiabo Ye
Anwen Hu
Haiyang Xu
Qinghao Ye
Mingshi Yan
...
Ji Zhang
Qin Jin
Liang He
Xin Lin
Feiyan Huang
VLM
MLLM
121
84
0
08 Oct 2023
SCOB: Universal Text Understanding via Character-wise Supervised
  Contrastive Learning with Online Text Rendering for Bridging Domain Gap
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap
Daehee Kim
Yoon Kim
Donghyun Kim
Yumin Lim
Geewook Kim
Taeho Kil
23
3
0
21 Sep 2023
Kosmos-2.5: A Multimodal Literate Model
Kosmos-2.5: A Multimodal Literate Model
Tengchao Lv
Yupan Huang
Jingye Chen
Lei Cui
Shuming Ma
...
Weiyao Luo
Shaoxiang Wu
Guoxin Wang
Cha Zhang
Furu Wei
VLM
MLLM
23
63
0
20 Sep 2023
Attention Where It Matters: Rethinking Visual Document Understanding
  with Selective Region Concentration
Attention Where It Matters: Rethinking Visual Document Understanding with Selective Region Concentration
H. Cao
Changcun Bao
Chaohu Liu
Huang-wei Chen
Kun Yin
Hao Liu
Yinsong Liu
Deqiang Jiang
Xing Sun
12
13
0
03 Sep 2023
Nougat: Neural Optical Understanding for Academic Documents
Nougat: Neural Optical Understanding for Academic Documents
Lukas Blecher
Guillem Cucurull
Thomas Scialom
Robert Stojnic
ViT
19
106
0
25 Aug 2023
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document
  Understanding
mPLUG-DocOwl: Modularized Multimodal Large Language Model for Document Understanding
Jiabo Ye
Anwen Hu
Haiyang Xu
Qinghao Ye
Mingshi Yan
...
Chenliang Li
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
VLM
MLLM
11
114
0
04 Jul 2023
On Evaluation of Document Classification using RVL-CDIP
On Evaluation of Document Classification using RVL-CDIP
Stefan Larson
Gordon Lim
Kevin Leach
26
3
0
21 Jun 2023
DocFormerv2: Local Features for Document Understanding
DocFormerv2: Local Features for Document Understanding
Srikar Appalaraju
Peng Tang
Qi Dong
Nishant Sankaran
Yichu Zhou
R. Manmatha
22
39
0
02 Jun 2023
Visually-Situated Natural Language Understanding with Contrastive
  Reading Model and Frozen Large Language Models
Visually-Situated Natural Language Understanding with Contrastive Reading Model and Frozen Large Language Models
Geewook Kim
Hodong Lee
D. Kim
Haeji Jung
S. Park
Yoon Kim
Sangdoo Yun
Taeho Kil
Bado Lee
Seunghyun Park
VLM
35
4
0
24 May 2023
OneCAD: One Classifier for All image Datasets using multimodal learning
OneCAD: One Classifier for All image Datasets using multimodal learning
S. Wadekar
Eugenio Culurciello
32
0
0
11 May 2023
12
Next