Multimodal Pre-training Based on Graph Attention Network for Document
Understanding

Multimodal Pre-training Based on Graph Attention Network for Document Understanding

25 March 2022

Jun Du

Papers citing "Multimodal Pre-training Based on Graph Attention Network for Document Understanding"

10 / 10 papers shown

Title
Document Image Rectification Bases on Self-Adaptive Multitask Fusion Heng Li Xiangping Wu Qingcai Chen 49 0 0 09 May 2025
DocMamba: Efficient Document Pre-training with State Space Model Pengfei Hu Zhenrong Zhang Jiefeng Ma Shuhang Liu Jun Du Jianshu Zhang Mamba 42 1 0 18 Sep 2024
HRDoc: Dataset and Baseline Method Toward Hierarchical Reconstruction of Document Structures Jiefeng Ma Jun Du Pengfei Hu Zhenrong Zhang Jianshu Zhang Huihui Zhu Cong Liu 27 15 0 24 Mar 2023
Split, embed and merge: An accurate table structure recognizer Zhenrong Zhang Jianshu Zhang Jun Du LMTD 97 57 0 12 Jul 2021
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text Hassan Akbari Liangzhe Yuan Rui Qian Wei-Hong Chuang Shih-Fu Chang Huayu Chen Boqing Gong ViT 251 577 0 22 Apr 2021
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding Yang Xu Yiheng Xu Tengchao Lv Lei Cui Furu Wei ... D. Florêncio Cha Zhang Wanxiang Che Min Zhang Lidong Zhou ViT MLLM 153 501 0 29 Dec 2020
Robust Character Labeling in Movie Videos: Data Resources and Self-supervised Feature Adaptation Krishna Somandepalli Rajat Hebbar Shrikanth Narayanan CVBM 32 5 0 25 Aug 2020
Improved Baselines with Momentum Contrastive Learning Xinlei Chen Haoqi Fan Ross B. Girshick Kaiming He SSL 279 3,378 0 09 Mar 2020
Spatiotemporal Co-attention Recurrent Neural Networks for Human-Skeleton Motion Prediction Xiangbo Shu Liyan Zhang Guo-Jun Qi Wei Liu Jinhui Tang 3DH HAI 44 203 0 29 Sep 2019
FUNSD: A Dataset for Form Understanding in Noisy Scanned Documents Guillaume Jaume H. K. Ekenel Jean-Philippe Thiran 143 356 0 27 May 2019