Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.13867
Cited By
Vision-Language Pre-Training for Boosting Scene Text Detectors
29 April 2022
Sibo Song
Jianqiang Wan
Zhibo Yang
Jun Tang
Wenqing Cheng
Xiang Bai
Cong Yao
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Vision-Language Pre-Training for Boosting Scene Text Detectors"
22 / 22 papers shown
Title
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
Zining Wang
Tongkun Guan
Pei Fu
Chen Duan
Qianyi Jiang
Zhentao Guo
Shan Guo
Junfeng Luo
Wei-Ming Shen
Xiaokang Yang
MLLM
VLM
69
0
0
18 Mar 2025
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
Wenwen Yu
Zhibo Yang
Jianqiang Wan
Sibo Song
J. Tang
Wenqing Cheng
Y. Liu
Xiang Bai
46
1
0
22 Feb 2025
WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting
Jingjing Wu
Zhengyao Fang
Pengyuan Lyu
Chengquan Zhang
Fanglin Chen
Guangming Lu
Wenjie Pei
47
2
0
28 Jul 2024
Unity in Diversity: Multi-expert Knowledge Confrontation and Collaboration for Generalizable Vehicle Re-identification
Zhenyu Kuang
Hongyang Zhang
Lidong Cheng
Yinhao Liu
Yue Huang
Xinghao Ding
Xinghao Ding
Huafeng Li
29
0
0
10 Jul 2024
Zero-shot Object Counting with Good Exemplars
Huilin Zhu
Jingling Yuan
Zhengwei Yang
Yu Guo
Zheng Wang
Xian Zhong
Shengfeng He
VLM
29
6
0
06 Jul 2024
OmniParser: A Unified Framework for Text Spotting, Key Information Extraction and Table Recognition
Jianqiang Wan
Sibo Song
Wenwen Yu
Yuliang Liu
Wenqing Cheng
Fei Huang
Xiang Bai
Cong Yao
Zhibo Yang
37
26
0
28 Mar 2024
ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Chen Duan
Pei Fu
Shan Guo
Qianyi Jiang
Xiaoming Wei
VLM
41
5
0
01 Mar 2024
Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation
Maoyuan Ye
Jing Zhang
Juhua Liu
Chenyu Liu
Baocai Yin
Cong Liu
Bo Du
Dacheng Tao
VLM
30
10
0
31 Jan 2024
Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors
Tongkun Guan
Wei Shen
Xuehang Yang
Xuehui Wang
Xiaokang Yang
27
7
0
08 Dec 2023
Turning a CLIP Model into a Scene Text Spotter
Wenwen Yu
Yuliang Liu
Xingkui Zhu
H. Cao
Xing Sun
Xiang Bai
VLM
CLIP
19
12
0
21 Aug 2023
Towards Robust Real-Time Scene Text Detection: From Semantic to Instance Representation Learning
Xugong Qin
Pengyuan Lyu
Chengquan Zhang
Yu Zhou
Kun Yao
Peng-Zhen Zhang
Hailun Lin
Weiping Wang
31
12
0
14 Aug 2023
Looking and Listening: Audio Guided Text Recognition
Wenwen Yu
Mingyu Liu
Biao Yang
Enming Zhang
Deqiang Jiang
Xing Sun
Yuliang Liu
Xiang Bai
DiffM
25
1
0
06 Jun 2023
Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness
Liangliang Cao
Bowen Zhang
Chen Chen
Yinfei Yang
Xianzhi Du
Wen‐Cheng Zhang
Zhiyun Lu
Yantao Zheng
CLIP
VLM
14
15
0
08 May 2023
Evaluating Synthetic Pre-Training for Handwriting Processing Tasks
Vittorio Pippi
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
25
5
0
04 Apr 2023
Modeling Entities as Semantic Points for Visual Information Extraction in the Wild
Zhibo Yang
Rujiao Long
Pengfei Wang
Sibo Song
Humen Zhong
Wenqing Cheng
X. Bai
Cong Yao
19
19
0
23 Mar 2023
Turning a CLIP Model into a Scene Text Detector
Wenwen Yu
Yuliang Liu
Wei Hua
Deqiang Jiang
Bo Ren
Xiang Bai
VLM
CLIP
MLLM
25
53
0
28 Feb 2023
MaskOCR: Text Recognition with Masked Encoder-Decoder Pretraining
Pengyuan Lyu
Chengquan Zhang
Shanshan Liu
Meina Qiao
Yangliu Xu
Liang Wu
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
24
42
0
01 Jun 2022
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
Minghao Li
Tengchao Lv
Jingye Chen
Lei Cui
Yijuan Lu
D. Florêncio
Cha Zhang
Zhoujun Li
Furu Wei
ViT
93
340
0
21 Sep 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
Yang Xu
Yiheng Xu
Tengchao Lv
Lei Cui
Furu Wei
...
D. Florêncio
Cha Zhang
Wanxiang Che
Min Zhang
Lidong Zhou
ViT
MLLM
145
498
0
29 Dec 2020
UnrealText: Synthesizing Realistic Scene Text Images from the Unreal World
Shangbang Long
Cong Yao
50
67
0
24 Mar 2020
Improved Baselines with Momentum Contrastive Learning
Xinlei Chen
Haoqi Fan
Ross B. Girshick
Kaiming He
SSL
238
3,359
0
09 Mar 2020
1