Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.14014
Cited By
CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
23 May 2023
Shuai Zhao
Xiaohan Wang
Linchao Zhu
Yezhou Yang
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model"
23 / 23 papers shown
Title
AS3D: 2D-Assisted Cross-Modal Understanding with Semantic-Spatial Scene Graphs for 3D Visual Grounding
Feng Xiao
Hongbin Xu
Guocan Zhao
Wenxiong Kang
41
0
0
07 May 2025
Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation
Andrea Maracani
Savas Ozkan
Sijun Cho
Hyowon Kim
Eunchung Noh
Jeongwon Min
Cho Jung Min
Dookun Park
Mete Ozay
38
0
0
20 Mar 2025
Focus, Distinguish, and Prompt: Unleashing CLIP for Efficient and Flexible Scene Text Retrieval
Gangyan Zeng
Yuan Zhang
Jin Wei
Dongbao Yang
Peng Zhang
Yiwen Gao
Xugong Qin
Yu Zhou
VLM
CLIP
13
0
0
01 Aug 2024
Classification of Non-native Handwritten Characters Using Convolutional Neural Network
F. A. Mamun
S. Chowdhury
J. E. Giti
H. Sarker
35
1
0
06 Jun 2024
HAAP: Vision-context Hierarchical Attention Autoregressive with Adaptive Permutation for Scene Text Recognition
Honghui Chen
Yuhang Qiu
Jiabao Wang
Pingping Chen
Nam Ling
32
0
0
15 May 2024
Glyph-ByT5: A Customized Text Encoder for Accurate Visual Text Rendering
Zeyu Liu
Weicong Liang
Zhanhao Liang
Chong Luo
Ji Li
Gao Huang
Yuhui Yuan
DiffM
64
23
0
14 Mar 2024
Beyond the Mud: Datasets and Benchmarks for Computer Vision in Off-Road Racing
Jacob Tyo
Motolani Olarinre
Youngseog Chung
Zachary Chase Lipton
20
0
0
12 Feb 2024
Enhancing Small Object Encoding in Deep Neural Networks: Introducing Fast&Focused-Net with Volume-wise Dot Product Layer
Tofik Ali
Partha Pratim Roy
ObjD
19
2
0
18 Jan 2024
An Empirical Study of Scaling Law for OCR
Miao Rang
Zhenni Bi
Chuanjian Liu
Yunhe Wang
Kai Han
23
6
0
29 Dec 2023
Reading Between the Mud: A Challenging Motorcycle Racer Number Dataset
Jacob Tyo
Youngseog Chung
Motolani Olarinre
Zachary Chase Lipton
11
0
0
14 Nov 2023
LVLM-eHub: A Comprehensive Evaluation Benchmark for Large Vision-Language Models
Peng-Tao Xu
Wenqi Shao
Kaipeng Zhang
Peng Gao
Shuo Liu
Meng Lei
Fanqing Meng
Siyuan Huang
Yu Qiao
Ping Luo
ELM
MLLM
23
158
0
15 Jun 2023
Masked and Permuted Implicit Context Learning for Scene Text Recognition
Xiaomeng Yang
Zhi Qiao
Jin Wei
Dongbao Yang
Yu Zhou
18
7
0
25 May 2023
Levenshtein OCR
Cheng Da
P. Wang
Cong Yao
ViT
71
32
0
08 Sep 2022
Reading and Writing: Discriminative and Generative Modeling for Self-Supervised Text Recognition
Mingkun Yang
Minghui Liao
Pu Lu
Jing Wang
Shenggao Zhu
Hualin Luo
Qingzhen Tian
X. Bai
SSL
27
55
0
01 Jul 2022
Fine-grained Image Captioning with CLIP Reward
Jaemin Cho
Seunghyun Yoon
Ajinkya Kale
Franck Dernoncourt
Trung Bui
Mohit Bansal
CLIP
121
76
0
26 May 2022
TrOCR: Transformer-based Optical Character Recognition with Pre-trained Models
Minghao Li
Tengchao Lv
Jingye Chen
Lei Cui
Yijuan Lu
D. Florêncio
Cha Zhang
Zhoujun Li
Furu Wei
ViT
93
340
0
21 Sep 2021
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
322
2,249
0
02 Sep 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Mohit Bansal
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
185
403
0
13 Jul 2021
ImageNet-21K Pretraining for the Masses
T. Ridnik
Emanuel Ben-Baruch
Asaf Noy
Lihi Zelnik-Manor
SSeg
VLM
CLIP
154
676
0
22 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
303
771
0
18 Apr 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
293
3,683
0
11 Feb 2021
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
Andreas Veit
Tomas Matera
Lukás Neumann
Jirí Matas
Serge J. Belongie
175
515
0
26 Jan 2016
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
279
39,083
0
01 Sep 2014
1