ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.14978
  4. Cited By
Vision Grid Transformer for Document Layout Analysis

Vision Grid Transformer for Document Layout Analysis

IEEE International Conference on Computer Vision (ICCV), 2023
29 August 2023
Cheng Da
Chuwei Luo
Qi Zheng
Cong Yao
    ViT
ArXiv (abs)PDFHTMLHuggingFace (4 upvotes)Github (1721★)

Papers citing "Vision Grid Transformer for Document Layout Analysis"

23 / 23 papers shown
OmniDocLayout: Towards Diverse Document Layout Generation via Coarse-to-Fine LLM Learning
OmniDocLayout: Towards Diverse Document Layout Generation via Coarse-to-Fine LLM Learning
Hengrui Kang
Zhuangcheng Gu
Zhiyuan Zhao
Zichen Wen
Bin Wang
W. Li
Conghui He
369
0
0
30 Oct 2025
Exploring OCR-augmented Generation for Bilingual VQA
Exploring OCR-augmented Generation for Bilingual VQA
JoonHo Lee
Sunho Park
VLM
129
1
0
02 Oct 2025
MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm
MonkeyOCR: Document Parsing with a Structure-Recognition-Relation Triplet Paradigm
Zhang Li
Yuliang Liu
Qiang Liu
Zhiyin Ma
Ziyang Zhang
...
Zidun Guo
Jiarui Zhang
Xinyu Wang
Xiang Bai
Xiang Bai
407
43
0
05 Jun 2025
VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
VisionReasoner: Unified Reasoning-Integrated Visual Perception via Reinforcement Learning
Yuqi Liu
Tianyuan Qu
Zhisheng Zhong
Bohao Peng
Shu Liu
Bei Yu
Jiaya Jia
VLMLRM
613
5
0
17 May 2025
Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs
Benchmarking Graph Neural Networks for Document Layout Analysis in Public Affairs
Miguel Lopez-Duran
Julian Fierrez
Aythami Morales
Ruben Tolosana
Oscar Delgado-Mohatar
Alvaro Ortigosa
344
3
0
12 May 2025
DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed Learning
DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed LearningComputer Vision and Pattern Recognition (CVPR), 2025
Xiao-Hui Li
Fei Yin
Cheng-Lin Liu
350
5
0
05 Apr 2025
SFDLA: Source-Free Document Layout Analysis
SFDLA: Source-Free Document Layout AnalysisIEEE International Conference on Document Analysis and Recognition (ICDAR), 2025
Sebastian Tewes
Yufan Chen
Omar Moured
Kailai Li
Rainer Stiefelhagen
345
3
0
24 Mar 2025
A Simple yet Effective Layout Token in Large Language Models for Document Understanding
A Simple yet Effective Layout Token in Large Language Models for Document UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Zhaoqing Zhu
Chuwei Luo
Zirui Shao
Feiyu Gao
Hangdi Xing
Qi Zheng
Ji Zhang
360
9
0
24 Mar 2025
PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data Construction
PP-DocLayout: A Unified Document Layout Detection Model to Accelerate Large-Scale Data Construction
Ting Sun
Cheng Cui
Yuning Du
Yi Liu
308
15
0
21 Mar 2025
TextBite: A Historical Czech Document Dataset for Logical Page Segmentation
TextBite: A Historical Czech Document Dataset for Logical Page Segmentation
Martin Kostelník
Karel Beneš
Michal Hradiš
236
0
0
20 Mar 2025
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models
Reading the unreadable: Creating a dataset of 19th century English newspapers using image-to-text language models
Jonathan Bourne
561
2
0
24 Feb 2025
EDocNet: Efficient Datasheet Layout Analysis Based on Focus and Global Knowledge Distillation
EDocNet: Efficient Datasheet Layout Analysis Based on Focus and Global Knowledge Distillation
Hong Cai Chen
Longchang Wu
Yang Zhang
217
1
0
23 Feb 2025
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
Wenwen Yu
Zhibo Yang
Jianqiang Wan
Sibo Song
J. Tang
Wenqing Cheng
Yunxing Liu
Xiang Bai
370
19
0
22 Feb 2025
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse
  Synthetic Data and Global-to-Local Adaptive Perception
DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception
Zhiyuan Zhao
Hengrui Kang
Bin Wang
Bin Wang
189
62
0
16 Oct 2024
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
ProcTag: Process Tagging for Assessing the Efficacy of Document Instruction Data
Yufan Shen
Chuwei Luo
Zhaoqing Zhu
Yang Chen
Qi Zheng
Zhi Yu
Jiajun Bu
Cong Yao
521
6
0
17 Jul 2024
DocGenome: An Open Large-scale Scientific Document Benchmark for
  Training and Testing Multi-modal Large Language Models
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Renqiu Xia
Song Mao
Xiangchao Yan
Hongbin Zhou
Bo Zhang
...
Yongwei Wang
Bin Wang
Junchi Yan
Fei Wu
Yu Qiao
288
30
0
17 Jun 2024
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
DistilDoc: Knowledge Distillation for Visually-Rich Document Applications
Jordy Van Landeghem
Subhajit Maity
Ayan Banerjee
Matthew Blaschko
Marie-Francine Moens
Josep Lladós
Sanket Biswas
471
6
0
12 Jun 2024
UnSupDLA: Towards Unsupervised Document Layout Analysis
UnSupDLA: Towards Unsupervised Document Layout Analysis
Talha Uddin Sheikh
Tahira Shehzadi
K. Hashmi
Didier Stricker
Muhammad Zeshan Afzal
263
4
0
10 Jun 2024
CREPE: Coordinate-Aware End-to-End Document Parser
CREPE: Coordinate-Aware End-to-End Document Parser
Yamato Okamoto
Youngmin Baek
Geewook Kim
Ryota Nakao
Donghyun Kim
Moonbin Yim
Seunghyun Park
Bado Lee
288
3
0
01 May 2024
LayoutLLM: Layout Instruction Tuning with Large Language Models for
  Document Understanding
LayoutLLM: Layout Instruction Tuning with Large Language Models for Document Understanding
Chuwei Luo
Yufan Shen
Zhaoqing Zhu
Qi Zheng
Zhi Yu
Cong Yao
442
121
0
08 Apr 2024
TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with
  Pre-trained Language Model
TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model
Jiahao Lyu
Jin Wei
Gangyan Zeng
Zeng Li
Enze Xie
Wei Wang
Can Ma
VLM
351
8
0
15 Mar 2024
GraphKD: Exploring Knowledge Distillation Towards Document Object
  Detection with Structured Graph Creation
GraphKD: Exploring Knowledge Distillation Towards Document Object Detection with Structured Graph Creation
Ayan Banerjee
Sanket Biswas
Josep Lladós
Umapada Pal
319
4
0
17 Feb 2024
Object Recognition from Scientific Document based on Compartment
  Refinement Framework
Object Recognition from Scientific Document based on Compartment Refinement FrameworkSN Computer Science (SCS), 2023
Jinghong Li
Wen Gu
Koichi Ota
Shinobu Hasegawa
434
5
0
14 Dec 2023
1
Page 1 of 1