ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.06732
  4. Cited By
Towards Robust Visual Information Extraction in Real World: New Dataset
  and Novel Solution

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

24 January 2021
Jiapeng Wang
Chongyu Liu
Lianwen Jin
Guozhi Tang
Jiaxin Zhang
Shuaitao Zhang
Qianying Wang
Y. Wu
Mingxiang Cai
ArXivPDFHTML

Papers citing "Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution"

50 / 51 papers shown
Title
Relation-Rich Visual Document Generator for Visual Information Extraction
Relation-Rich Visual Document Generator for Visual Information Extraction
Zi-Han Jiang
Chien-Wei Lin
Wei-Hua Li
Hsuan-Tung Liu
Yi-Ren Yeh
Chu-Song Chen
30
0
0
14 Apr 2025
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
Wenwen Yu
Zhibo Yang
Jianqiang Wan
Sibo Song
J. Tang
Wenqing Cheng
Y. Liu
Xiang Bai
48
1
0
22 Feb 2025
Enhancing Document Key Information Localization Through Data Augmentation
Enhancing Document Key Information Localization Through Data Augmentation
Yue Dai
78
0
0
10 Feb 2025
SAIL: Sample-Centric In-Context Learning for Document Information
  Extraction
SAIL: Sample-Centric In-Context Learning for Document Information Extraction
Jinyu Zhang
Zhiyuan You
Jize Wang
Xinyi Le
69
1
0
22 Dec 2024
HIP: Hierarchical Point Modeling and Pre-training for Visual Information
  Extraction
HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction
Rujiao Long
Pengfei Wang
Zhibo Yang
Cong Yao
41
0
0
02 Nov 2024
"What is the value of {templates}?" Rethinking Document Information
  Extraction Datasets for LLMs
"What is the value of {templates}?" Rethinking Document Information Extraction Datasets for LLMs
Ran Zmigrod
Pranav Shetty
Mathieu Sibue
Zhiqiang Ma
Armineh Nourbakhsh
Xiaomo Liu
Manuela Veloso
23
0
0
20 Oct 2024
DAViD: Domain Adaptive Visually-Rich Document Understanding with
  Synthetic Insights
DAViD: Domain Adaptive Visually-Rich Document Understanding with Synthetic Insights
Yihao Ding
S. Han
Zechuan Li
Hyunsuk Chung
16
0
0
02 Oct 2024
Arctic-TILT. Business Document Understanding at Sub-Billion Scale
Arctic-TILT. Business Document Understanding at Sub-Billion Scale
Łukasz Borchmann
Michał Pietruszka
Wojciech Ja'skowski
Dawid Jurkiewicz
Piotr Halama
...
Gabriela Nowakowska
Artur Zawłocki
Łukasz Duhr
Paweł Dyda
Michał Turski
VLM
34
1
0
08 Aug 2024
Deep Learning based Visually Rich Document Content Understanding: A
  Survey
Deep Learning based Visually Rich Document Content Understanding: A Survey
Muhammad Ali
Jean Lee
Salman Khan
39
6
0
02 Aug 2024
UNER: A Unified Prediction Head for Named Entity Recognition in
  Visually-rich Documents
UNER: A Unified Prediction Head for Named Entity Recognition in Visually-rich Documents
Yi Tu
Chong Zhang
Ya Guo
Huan Chen
Jinyang Tang
Huijia Zhu
Qi Zhang
43
3
0
02 Aug 2024
SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction
  Benchmark in Form Understanding
SRFUND: A Multi-Granularity Hierarchical Structure Reconstruction Benchmark in Form Understanding
Jiefeng Ma
Yan Wang
Chenyu Liu
Jun Du
Yu Hu
Zhenrong Zhang
Pengfei Hu
Qing Wang
Jianshu Zhang
36
0
0
13 Jun 2024
StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image
  Perception, Comprehension, and Beyond
StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Pengyuan Lyu
Yulin Li
Hao Zhou
Weihong Ma
Xingyu Wan
...
Liang Wu
Chengquan Zhang
Kun Yao
Errui Ding
Jingdong Wang
36
7
0
31 May 2024
KVP10k : A Comprehensive Dataset for Key-Value Pair Extraction in
  Business Documents
KVP10k : A Comprehensive Dataset for Key-Value Pair Extraction in Business Documents
O. Naparstek
Roi Pony
Inbar Shapira
Foad Abo Dahood
Ophir Azulai
...
Idan Friedman
Orit Prince
Yevgeny Burshtein
Adi Raz Goldfarb
Udi Barzelay
28
1
0
01 May 2024
CREPE: Coordinate-Aware End-to-End Document Parser
CREPE: Coordinate-Aware End-to-End Document Parser
Yamato Okamoto
Youngmin Baek
Geewook Kim
Ryota Nakao
Donghyun Kim
Moonbin Yim
Seunghyun Park
Bado Lee
27
1
0
01 May 2024
Ensemble Learning for Vietnamese Scene Text Spotting in Urban
  Environments
Ensemble Learning for Vietnamese Scene Text Spotting in Urban Environments
Hieu Nguyen
Cong-Hoang Ta
Phuong-Thuy Le-Nguyen
Minh-Triet Tran
Trung-Truc Huynh-Le
32
0
0
01 Apr 2024
OmniParser: A Unified Framework for Text Spotting, Key Information
  Extraction and Table Recognition
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
Jianqiang Wan
Sibo Song
Wenwen Yu
Yuliang Liu
Wenqing Cheng
Fei Huang
Xiang Bai
Cong Yao
Zhibo Yang
45
26
0
28 Mar 2024
EK-Net:Real-time Scene Text Detection with Expand Kernel Distance
EK-Net:Real-time Scene Text Detection with Expand Kernel Distance
Boyuan Zhu
Fagui Liu
Xi Chen
Quan Tang
14
1
0
22 Jan 2024
SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting
SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting
Mingxin Huang
Dezhi Peng
Hongliang Li
Zhenghao Peng
Chongyu Liu
Dahua Lin
Yuliang Liu
Xiang Bai
Lianwen Jin
77
1
0
15 Jan 2024
LORE++: Logical Location Regression Network for Table Structure
  Recognition with Pre-training
LORE++: Logical Location Regression Network for Table Structure Recognition with Pre-training
Rujiao Long
Hangdi Xing
Zhibo Yang
Qi Zheng
Zhi Yu
Cong Yao
Fei Huang
25
4
0
03 Jan 2024
Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and
  In-depth Evaluation
Exploring OCR Capabilities of GPT-4V(ision) : A Quantitative and In-depth Evaluation
Yongxin Shi
Dezhi Peng
Wenhui Liao
Zening Lin
Xinhong Chen
Chongyu Liu
Yuyi Zhang
Lianwen Jin
MLLM
28
44
0
25 Oct 2023
A Multi-Modal Multilingual Benchmark for Document Image Classification
A Multi-Modal Multilingual Benchmark for Document Image Classification
Yoshinari Fujinuma
Siddharth Varia
Nishant Sankaran
Srikar Appalaraju
Bonan Min
Yogarshi Vyas
VLM
18
4
0
25 Oct 2023
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye
  Movement for Machine Reading
DocTrack: A Visually-Rich Document Dataset Really Aligned with Human Eye Movement for Machine Reading
Hao Wang
Qingxuan Wang
Yue Li
Changqing Wang
Chenhui Chu
Rui-cang Wang
VGen
21
3
0
23 Oct 2023
PPN: Parallel Pointer-based Network for Key Information Extraction with
  Complex Layouts
PPN: Parallel Pointer-based Network for Key Information Extraction with Complex Layouts
Kaiwen Wei
Jie Yao
Jingyuan Zhang
Yangyang Kang
Fubang Zhao
Yating Zhang
Changlong Sun
Xin Jin
Xin Zhang
16
4
0
20 Jul 2023
DocAligner: Annotating Real-world Photographic Document Images by Simply
  Taking Pictures
DocAligner: Annotating Real-world Photographic Document Images by Simply Taking Pictures
Jiaxin Zhang
Bangdong Chen
Hiuyi Cheng
Fengjun Guo
Kai Ding
Lianwen Jin
24
6
0
09 Jun 2023
ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich
  Document Images
ICDAR 2023 Competition on Structured Text Extraction from Visually-Rich Document Images
Wenwen Yu
Chengquan Zhang
H. Cao
Wei Hua
Bohan Li
...
M. Zhang
Dimosthenis Karatzas
Xingchao Sun
Jingdong Wang
Xiang Bai
26
11
0
05 Jun 2023
Layout and Task Aware Instruction Prompt for Zero-shot Document Image
  Question Answering
Layout and Task Aware Instruction Prompt for Zero-shot Document Image Question Answering
Wenjin Wang
Yunhao Li
Yixin Ou
Yin Zhang
VLM
21
24
0
01 Jun 2023
Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided
  Dynamic Token Merge for Document Understanding
Fast-StrucTexT: An Efficient Hourglass Transformer with Modality-guided Dynamic Token Merge for Document Understanding
Mingliang Zhai
Yulin Li
Xiameng Qin
Chen Yi
Qunyi Xie
Chengquan Zhang
Kun Yao
Yuwei Wu
Yunde Jia
13
8
0
19 May 2023
Visual Information Extraction in the Wild: Practical Dataset and
  End-to-end Solution
Visual Information Extraction in the Wild: Practical Dataset and End-to-end Solution
Jianfeng Kuang
Wei Hua
Dingkang Liang
Mingkun Yang
Deqiang Jiang
Bo Ren
Xiang Bai
27
39
0
12 May 2023
Large Scale Genealogical Information Extraction From Handwritten Quebec
  Parish Records
Large Scale Genealogical Information Extraction From Handwritten Quebec Parish Records
Solène Tarride
Martin Maarand
Mélodie Boillet
James McGrath
Eugénie Capel
H. Vézina
Christopher Kermorvant
24
10
0
27 Apr 2023
DocParser: End-to-end OCR-free Information Extraction from Visually Rich
  Documents
DocParser: End-to-end OCR-free Information Extraction from Visually Rich Documents
M. Dhouib
G. Bettaieb
A. Shabou
17
20
0
24 Apr 2023
Modeling Entities as Semantic Points for Visual Information Extraction
  in the Wild
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild
Zhibo Yang
Rujiao Long
Pengfei Wang
Sibo Song
Humen Zhong
Wenqing Cheng
X. Bai
Cong Yao
29
19
0
23 Mar 2023
StrucTexTv2: Masked Visual-Textual Prediction for Document Image
  Pre-training
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
Yu Yu
Yulin Li
Chengquan Zhang
Xiaoqiang Zhang
Zengyuan Guo
Xiameng Qin
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
8
45
0
01 Mar 2023
DocILE Benchmark for Document Information Localization and Extraction
DocILE Benchmark for Document Information Localization and Extraction
vStvepán vSimsa
Milan vSulc
Michal Uvrivcávr
Yash J. Patel
Ahmed Hamdi
...
Matyávs Skalický
Jivrí Matas
Antoine Doucet
Mickael Coustaty
Dimosthenis Karatzas
24
33
0
11 Feb 2023
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document
  Understanding
Wukong-Reader: Multi-modal Pre-training for Fine-grained Visual Document Understanding
Haoli Bai
Zhiguang Liu
Xiaojun Meng
Wentao Li
Shuangning Liu
...
Liangwei Wang
Lu Hou
Jiansheng Wei
Xin Jiang
Qun Liu
ViT
24
11
0
19 Dec 2022
ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document
  Understanding
ERNIE-mmLayout: Multi-grained MultiModal Transformer for Document Understanding
Wenjin Wang
Zhengjie Huang
Bin Luo
Qianglong Chen
Qiming Peng
...
Weichong Yin
Shi Feng
Yu Sun
Dianhai Yu
Yin Zhang
ViT
27
11
0
18 Sep 2022
TRIE++: Towards End-to-End Information Extraction from Visually Rich
  Documents
TRIE++: Towards End-to-End Information Extraction from Visually Rich Documents
Zhanzhan Cheng
Peng Zhang
Can Li
Qiao Liang
Yunlu Xu
Pengfei Li
Shiliang Pu
Yi Niu
Fei Wu
16
10
0
14 Jul 2022
Layout-Aware Information Extraction for Document-Grounded Dialogue:
  Dataset, Method and Demonstration
Layout-Aware Information Extraction for Document-Grounded Dialogue: Dataset, Method and Demonstration
Zhenyu Zhang
Yu Bowen
Haiyang Yu
Tingwen Liu
Cheng Fu
Jingyang Li
Chengguang Tang
Jian Sun
Yongbin Li
31
6
0
14 Jul 2022
GMN: Generative Multi-modal Network for Practical Document Information
  Extraction
GMN: Generative Multi-modal Network for Practical Document Information Extraction
H. Cao
Jiefeng Ma
Antai Guo
Yiqing Hu
Hao Liu
Deqiang Jiang
Yinsong Liu
Bo Ren
18
8
0
11 Jul 2022
Business Document Information Extraction: Towards Practical Benchmarks
Business Document Information Extraction: Towards Practical Benchmarks
Matyás Skalický
Stepán Simsa
Michal Uřičář
Milan Šulc
22
9
0
20 Jun 2022
RDU: A Region-based Approach to Form-style Document Understanding
RDU: A Region-based Approach to Form-style Document Understanding
Fengbin Zhu
Chao Wang
Wenqiang Lei
Ziyang Liu
Tat-Seng Chua
17
2
0
14 Jun 2022
LayoutLMv3: Pre-training for Document AI with Unified Text and Image
  Masking
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Yupan Huang
Tengchao Lv
Lei Cui
Yutong Lu
Furu Wei
25
432
0
18 Apr 2022
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text
  Detection and Text Recognition
SwinTextSpotter: Scene Text Spotting via Better Synergy between Text Detection and Text Recognition
Mingxin Huang
Yuliang Liu
Zhenghao Peng
Chongyu Liu
Dahua Lin
Shenggao Zhu
N. Yuan
Kai Ding
Lianwen Jin
ViT
13
98
0
19 Mar 2022
LiLT: A Simple yet Effective Language-Independent Layout Transformer for
  Structured Document Understanding
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
Jiapeng Wang
Lianwen Jin
Kai Ding
VLM
17
138
0
28 Feb 2022
Document AI: Benchmarks, Models and Applications
Document AI: Benchmarks, Models and Applications
Lei Cui
Yiheng Xu
Tengchao Lv
Furu Wei
VLM
21
69
0
16 Nov 2021
IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text
  Recognition
IFR: Iterative Fusion Based Recognizer For Low Quality Scene Text Recognition
Zhiwei Jia
Shugong Xu
Shiyi Mu
Y. Tao
Shan Cao
Zhiyong Chen
11
3
0
13 Aug 2021
StrucTexT: Structured Text Understanding with Multi-Modal Transformers
StrucTexT: Structured Text Understanding with Multi-Modal Transformers
Yulin Li
Yuxi Qian
Yuchen Yu
Xiameng Qin
Chengquan Zhang
Yan Liu
Kun Yao
Junyu Han
Jingtuo Liu
Errui Ding
27
113
0
06 Aug 2021
MatchVIE: Exploiting Match Relevancy between Entities for Visual
  Information Extraction
MatchVIE: Exploiting Match Relevancy between Entities for Visual Information Extraction
Guozhi Tang
Lele Xie
Lianwen Jin
Jiapeng Wang
Jingdong Chen
Zhen Xu
Qianying Wang
Yaqiang Wu
Hui Li
15
29
0
24 Jun 2021
Tag, Copy or Predict: A Unified Weakly-Supervised Learning Framework for
  Visual Information Extraction using Sequences
Tag, Copy or Predict: A Unified Weakly-Supervised Learning Framework for Visual Information Extraction using Sequences
Jiapeng Wang
Tianwei Wang
Guozhi Tang
Lianwen Jin
Weihong Ma
Kai Ding
Yichao Huang
22
12
0
20 Jun 2021
ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for
  Key Information Extraction from Documents
ViBERTgrid: A Jointly Trained Multi-Modal 2D Document Representation for Key Information Extraction from Documents
Weihong Lin
Qifang Gao
Lei-huan Sun
Zhuoyao Zhong
Kaiqin Hu
Qin Ren
Qiang Huo
23
37
0
25 May 2021
Spatial Dependency Parsing for Semi-Structured Document Information
  Extraction
Spatial Dependency Parsing for Semi-Structured Document Information Extraction
Wonseok Hwang
Jinyeong Yim
Seunghyun Park
Sohee Yang
Minjoon Seo
32
92
0
01 May 2020
12
Next