ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.03911
  4. Cited By
Language Matters: A Weakly Supervised Vision-Language Pre-training
  Approach for Scene Text Detection and Spotting

Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting

8 March 2022
Chuhui Xue
Wenqing Zhang
Yu Hao
Shijian Lu
Philip H. S. Torr
Song Bai
    VLM
ArXivPDFHTML

Papers citing "Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting"

23 / 23 papers shown
Title
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
Zining Wang
Tongkun Guan
Pei Fu
Chen Duan
Qianyi Jiang
Zhentao Guo
Shan Guo
Junfeng Luo
Wei-Ming Shen
Xiaokang Yang
MLLM
VLM
64
0
0
18 Mar 2025
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
Wenwen Yu
Zhibo Yang
Jianqiang Wan
Sibo Song
J. Tang
Wenqing Cheng
Y. Liu
Xiang Bai
35
1
0
22 Feb 2025
WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting
WeCromCL: Weakly Supervised Cross-Modality Contrastive Learning for Transcription-only Supervised Text Spotting
Jingjing Wu
Zhengyao Fang
Pengyuan Lyu
Chengquan Zhang
Fanglin Chen
Guangming Lu
Wenjie Pei
37
2
0
28 Jul 2024
CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction
CLII: Visual-Text Inpainting via Cross-Modal Predictive Interaction
Liang Zhao
Qing-Wu Guo
Xiaoguang Li
Song Wang
DiffM
21
0
0
23 Jul 2024
End-to-End Semi-Supervised approach with Modulated Object Queries for
  Table Detection in Documents
End-to-End Semi-Supervised approach with Modulated Object Queries for Table Detection in Documents
Iqraa Ehsan
Tahira Shehzadi
Didier Stricker
Muhammad Zeshan Afzal
LMTD
23
3
0
08 May 2024
TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with
  Pre-trained Language Model
TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model
Jiahao Lyu
Jin Wei
Gangyan Zeng
Zeng Li
Enze Xie
Wei Wang
Yu Zhou
VLM
19
3
0
15 Mar 2024
ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text
  Detection and Spotting
ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Chen Duan
Pei Fu
Shan Guo
Qianyi Jiang
Xiaoming Wei
VLM
24
5
0
01 Mar 2024
Hi-SAM: Marrying Segment Anything Model for Hierarchical Text
  Segmentation
Hi-SAM: Marrying Segment Anything Model for Hierarchical Text Segmentation
Maoyuan Ye
Jing Zhang
Juhua Liu
Chenyu Liu
Baocai Yin
Cong Liu
Bo Du
Dacheng Tao
VLM
22
2
0
31 Jan 2024
Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors
Bridging Synthetic and Real Worlds for Pre-training Scene Text Detectors
Tongkun Guan
Wei Shen
Xuehang Yang
Xuehui Wang
Xiaokang Yang
19
7
0
08 Dec 2023
SCOB: Universal Text Understanding via Character-wise Supervised
  Contrastive Learning with Online Text Rendering for Bridging Domain Gap
SCOB: Universal Text Understanding via Character-wise Supervised Contrastive Learning with Online Text Rendering for Bridging Domain Gap
Daehee Kim
Yoon Kim
Donghyun Kim
Yumin Lim
Geewook Kim
Taeho Kil
13
3
0
21 Sep 2023
Turning a CLIP Model into a Scene Text Spotter
Turning a CLIP Model into a Scene Text Spotter
Wenwen Yu
Yuliang Liu
Xingkui Zhu
H. Cao
Xing Sun
Xiang Bai
VLM
CLIP
8
12
0
21 Aug 2023
Towards Robust Real-Time Scene Text Detection: From Semantic to Instance
  Representation Learning
Towards Robust Real-Time Scene Text Detection: From Semantic to Instance Representation Learning
Xugong Qin
Pengyuan Lyu
Chengquan Zhang
Yu Zhou
Kun Yao
Peng-Zhen Zhang
Hailun Lin
Weiping Wang
28
13
0
14 Aug 2023
ICDAR 2023 Competition on Hierarchical Text Detection and Recognition
ICDAR 2023 Competition on Hierarchical Text Detection and Recognition
Shangbang Long
Siyang Qin
Dmitry Panteleev
Alessandro Bissacco
Yasuhisa Fujii
Michalis Raptis
VLM
27
17
0
16 May 2023
Less is More: Removing Text-regions Improves CLIP Training Efficiency
  and Robustness
Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness
Liangliang Cao
Bowen Zhang
Chen Chen
Yinfei Yang
Xianzhi Du
Wen‐Cheng Zhang
Zhiyun Lu
Yantao Zheng
CLIP
VLM
6
13
0
08 May 2023
Turning a CLIP Model into a Scene Text Detector
Turning a CLIP Model into a Scene Text Detector
Wenwen Yu
Yuliang Liu
Wei Hua
Deqiang Jiang
Bo Ren
Xiang Bai
VLM
CLIP
MLLM
22
53
0
28 Feb 2023
CLIPTER: Looking at the Bigger Picture in Scene Text Recognition
CLIPTER: Looking at the Bigger Picture in Scene Text Recognition
Aviad Aberdam
David Bensaid
Alona Golts
Roy Ganz
Oren Nuriel
Royee Tichauer
Shai Mazor
Ron Litman
VLM
CLIP
17
11
0
18 Jan 2023
Domain Adaptive Scene Text Detection via Subcategorization
Domain Adaptive Scene Text Detection via Subcategorization
Zichen Tian
Chuhui Xue
Jingyi Zhang
Shijian Lu
15
3
0
01 Dec 2022
1st Place Solution to ECCV 2022 Challenge on Out of Vocabulary Scene
  Text Understanding: End-to-End Recognition of Out of Vocabulary Words
1st Place Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: End-to-End Recognition of Out of Vocabulary Words
Zhangzi Zhu
Chuhui Xue
Yu Hao
Wenqing Zhang
Song Bai
43
0
0
01 Sep 2022
Runner-Up Solution to ECCV 2022 Challenge on Out of Vocabulary Scene
  Text Understanding: Cropped Word Recognition
Runner-Up Solution to ECCV 2022 Challenge on Out of Vocabulary Scene Text Understanding: Cropped Word Recognition
Zhangzi Zhu
Yu Hao
Wenqing Zhang
Chuhui Xue
Song Bai
17
1
0
04 Aug 2022
I2C2W: Image-to-Character-to-Word Transformers for Accurate Scene Text
  Recognition
I2C2W: Image-to-Character-to-Word Transformers for Accurate Scene Text Recognition
Chuhui Xue
Jiaxing Huang
Wenqing Zhang
Shijian Lu
Changhu Wang
S. Bai
8
16
0
18 May 2021
Detection and Rectification of Arbitrary Shaped Scene Texts by using
  Text Keypoints and Links
Detection and Rectification of Arbitrary Shaped Scene Texts by using Text Keypoints and Links
Chuhui Xue
Shijian Lu
S. Hoi
3DPC
25
17
0
01 Mar 2021
Deep Relational Reasoning Graph Network for Arbitrary Shape Text
  Detection
Deep Relational Reasoning Graph Network for Arbitrary Shape Text Detection
Shi-Xue Zhang
Xiaobin Zhu
Jie-Bo Hou
Chang-rui Liu
Chun Yang
Hongfa Wang
Xu-Cheng Yin
GNN
30
181
0
17 Mar 2020
Convolutional Character Networks
Convolutional Character Networks
Linjie Xing
Zhi Tian
Weilin Huang
Matthew R. Scott
35
155
0
17 Oct 2019
1