Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1507.05717
Cited By
An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2015
21 July 2015
Baoguang Shi
X. Bai
Cong Yao
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition"
50 / 680 papers shown
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
Wenwen Yu
Zhibo Yang
Jianqiang Wan
Sibo Song
J. Tang
Wenqing Cheng
Yunxing Liu
Xiang Bai
335
14
0
22 Feb 2025
Handwritten Text Recognition: A Survey
Carlos Garrido-Munoz
Antonio Ríos-Vila
Jorge Calvo-Zaragoza
315
6
0
12 Feb 2025
PLATTER: A Page-Level Handwritten Text Recognition System for Indic Scripts
Badri Vishal Kasuba
Dhruv Kudale
Venkatapathy Subramanian
P. Chaudhuri
Ganesh Ramakrishnan
294
1
0
10 Feb 2025
SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Jiawei Liu
Yuanzhi Zhu
Feiyu Gao
Zhiyong Yang
P. Wang
Junyang Lin
Xinyu Wang
Wenyu Liu
DiffM
354
0
0
08 Jan 2025
First-place Solution for Streetscape Shop Sign Recognition Competition
Bin Wang
Li Jing
979
0
0
06 Jan 2025
Efficient Video-Based ALPR System Using YOLO and Visual Rhythm
Victor Nascimento Ribeiro
Nina S. T. Hirata
217
0
0
04 Jan 2025
Instruction-Guided Scene Text Recognition
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Yongkun Du
Z. Chen
Yuchen Su
Caiyan Jia
Yu-Gang Jiang
490
17
0
03 Jan 2025
Disentanglement and Compositionality of Letter Identity and Letter Position in Variational Auto-Encoder Vision Models
Bruno Bianchi
Aakash Agrawal
S. Dehaene
Emmanuel Chemla
Yair Lakretz
DRL
CoGe
350
0
0
11 Dec 2024
TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
Xingsong Ye
Yongkun Du
Yunbo Tao
Z. Chen
DiffM
433
2
0
02 Dec 2024
DLaVA: Document Language and Vision Assistant for Answer Localization with Enhanced Interpretability and Trustworthiness
Ahmad Mohammadshirazi
Pinaki Prasad Guha Neogi
Ser-Nam Lim
R. Ramnath
433
6
0
29 Nov 2024
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
Yongkun Du
Z. Chen
Hongtao Xie
Caiyan Jia
Yu-Gang Jiang
384
19
0
24 Nov 2024
Boosting Semi-Supervised Scene Text Recognition via Viewing and Summarizing
Neural Information Processing Systems (NeurIPS), 2024
Yadong Qu
Yuxin Wang
Bangbang Zhou
Zihan Wang
Hongtao Xie
Yongdong Zhang
236
2
0
23 Nov 2024
Learning based Geéz character handwritten recognition
Hailemicael Lulseged Yimer
Hailegabriel Dereje Degefa
Marco Cristani
Federico Cunico
193
2
0
20 Nov 2024
Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition
T. Lin
Jinglei Zhang
Yi Xu
Kai Chen
Rui Zhang
Chong Chen
349
0
0
18 Nov 2024
SAN: Structure-Aware Network for Complex and Long-tailed Chinese Text Recognition
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2024
Jing Zhang
Chang-rui Liu
Chun Yang
167
3
0
10 Nov 2024
HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction
Rujiao Long
Pengfei Wang
Zhibo Yang
Cong Yao
251
0
0
02 Nov 2024
Visual Text Matters: Improving Text-KVQA with Visual Text Entity Knowledge-aware Large Multimodal Assistant
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
A. S. Penamakuri
Anand Mishra
328
2
0
24 Oct 2024
Human-Inspired Long-Term Indoor Localization in Human-Oriented Environment
Nicky Zimmerman
Matteo Sodano
232
0
0
16 Oct 2024
ChartKG: A Knowledge-Graph-Based Representation for Chart Images
IEEE Transactions on Visualization and Computer Graphics (TVCG), 2024
Zhiguang Zhou
Haoxuan Wang
Zhengqing Zhao
Fengling Zheng
Yongheng Wang
Wei Chen
Yong Wang
287
4
0
13 Oct 2024
Grounding Partially-Defined Events in Multimodal Data
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Kate Sanders
Reno Kriz
David Etter
Hannah Recknor
Alexander Martin
Cameron Carpenter
Jingyang Lin
Benjamin Van Durme
171
5
0
07 Oct 2024
HATFormer: Historic Handwritten Arabic Text Recognition with Transformers
Adrian Chan
Anupam Mijar
Mehreen Saeed
Chau-Wai Wong
Akram Khater
648
3
0
03 Oct 2024
AI-Powered Augmented Reality for Satellite Assembly, Integration and Test
Alvaro Patricio
Joao Valente
Atabak Dehban
Ines Cadilha
Daniel Reis
Rodrigo Ventura
132
3
0
26 Sep 2024
Text Image Generation for Low-Resource Languages with Dual Translation Learning
Chihiro Noguchi
Shun Fukuda
Shoichiro Mihara
Masao Yamanaka
DiffM
204
0
0
26 Sep 2024
General Detection-based Text Line Recognition
Neural Information Processing Systems (NeurIPS), 2024
Raphael Baena
Syrine Kalleli
Mathieu Aubry
981
1
0
25 Sep 2024
One Model for Two Tasks: Cooperatively Recognizing and Recovering Low-Resolution Scene Text Images by Iterative Mutual Guidance
Minyi Zhao
Yang Wang
Jihong Guan
Shuigeng Zhou
182
0
0
22 Sep 2024
VL-Reader: Vision and Language Reconstructor is an Effective Scene Text Recognizer
ACM Multimedia (MM), 2024
Humen Zhong
Zhibo Yang
Zhaohai Li
Peng Wang
Jun Tang
Wenqing Cheng
Cong Yao
252
5
0
18 Sep 2024
HTR-VT: Handwritten Text Recognition with Vision Transformer
Pattern Recognition (Pattern Recogn.), 2024
Yuting Li
Dexiong Chen
Tinglong Tang
Xi Shen
ViT
156
32
0
13 Sep 2024
Boosting CNN-based Handwriting Recognition Systems with Learnable Relaxation Labeling
S. Ferro
Alessandro Torcinovich
Arianna Traviglia
Marcello Pelillo
126
0
0
09 Sep 2024
PdfTable: A Unified Toolkit for Deep Learning-Based Table Extraction
Lei Sheng
Shuai-Shuai Xu
LMTD
200
0
0
08 Sep 2024
RoomDiffusion: A Specialized Diffusion Model in the Interior Design Industry
Zhaowei Wang
Ying Hao
Hao Wei
Qing Xiao
Lulu Chen
Yulong Li
Yue Yang
Tianyi Li
DiffM
102
2
0
05 Sep 2024
Platypus: A Generalized Specialist Model for Reading Text in Various Forms
European Conference on Computer Vision (ECCV), 2024
Peng Wang
Zhaohai Li
Jun Tang
Humen Zhong
Fei Huang
Zhibo Yang
Cong Yao
VLM
ObjD
203
2
0
27 Aug 2024
Decoder Pre-Training with only Text for Scene Text Recognition
ACM Multimedia (MM), 2024
Shuai Zhao
Yongkun Du
Zhineng Chen
Yu-Gang Jiang
154
6
0
11 Aug 2024
Image-to-LaTeX Converter for Mathematical Formulas and Text
Daniil Gurgurov
Aleksey Morshnev
ViT
VLM
188
3
0
07 Aug 2024
LEGO: Self-Supervised Representation Learning for Scene Text Images
Yujin Ren
Jiaxin Zhang
Lianwen Jin
SSL
252
0
0
04 Aug 2024
Self-Supervised Learning for Text Recognition: A Critical Survey
International Journal of Computer Vision (IJCV), 2024
Carlos Peñarrubia
J. J. Valero-Mas
Jorge Calvo-Zaragoza
424
4
0
29 Jul 2024
Visual Text Generation in the Wild
Yuanzhi Zhu
Jiawei Liu
Feiyu Gao
Wenyu Liu
Xinggang Wang
Peng Wang
Fei Huang
Cong Yao
Zhibo Yang
DiffM
240
14
0
19 Jul 2024
Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition
Gagan Bhatia
El Moatez Billah Nagoudi
Fakhraddin Alwajih
Muhammad Abdul-Mageed
179
11
0
18 Jul 2024
Back to Newton's Laws: Learning Vision-based Agile Flight via Differentiable Physics
Yuang Zhang
Yu Hu
Yunlong Song
Danping Zou
Weiyao Lin
328
36
0
15 Jul 2024
Long-range Turbulence Mitigation: A Large-scale Dataset and A Coarse-to-fine Framework
Shengqi Xu
Run Sun
Yi Chang
Shuning Cao
Xueyao Xiao
Luxin Yan
189
6
0
11 Jul 2024
PosFormer: Recognizing Complex Handwritten Mathematical Expression with Position Forest Transformer
Tongkun Guan
Chengyu Lin
Wei Shen
Xiaokang Yang
267
15
0
10 Jul 2024
Spanish TrOCR: Leveraging Transfer Learning for Language Adaptation
Filipe Lauar
Valentin Laurent
142
2
0
09 Jul 2024
Focus on the Whole Character: Discriminative Character Modeling for Scene Text Recognition
Bangbang Zhou
Yadong Qu
Zixiao Wang
Zicheng Li
Boqiang Zhang
Hongtao Xie
266
3
0
08 Jul 2024
MixTex: Unambiguous Recognition Should Not Rely Solely on Real Data
Renqing Luo
Yuhan Xu
224
0
0
24 Jun 2024
Fusion of Movement and Naive Predictions for Point Forecasting in Univariate Random Walks
Cheng Zhang
148
0
0
20 Jun 2024
AnyTrans: Translate AnyText in the Image with Large Scale Models
Zhipeng Qian
Pei Zhang
Baosong Yang
Kai Fan
Yiwei Ma
Yang Li
Xiaoshuai Sun
Rongrong Ji
VLM
240
3
0
17 Jun 2024
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
International Conference on Learning Representations (ICLR), 2024
Tianyu Zhang
Suyuchen Wang
Lu Li
Ge Zhang
Perouz Taslakian
Sai Rajeswar
Jie Fu
Bang Liu
Yoshua Bengio
261
5
0
10 Jun 2024
Classification of Non-native Handwritten Characters Using Convolutional Neural Network
F. A. Mamun
S. Chowdhury
J. E. Giti
H. Sarker
264
1
0
06 Jun 2024
Improving Text Generation on Images with Synthetic Captions
Jun Young Koh
Sang Hyun Park
Joy Song
DiffM
339
4
0
01 Jun 2024
LOGO: Video Text Spotting with Language Collaboration and Glyph Perception Model
Hongen Liu
Di Sun
Jiahao Wang
Lu Dong
Gang Pan
270
1
0
29 May 2024
Dataset and Benchmark for Urdu Natural Scenes Text Detection, Recognition and Visual Question Answering
Hiba Maryam
Ling Fu
Jiajun Song
Tajrian Abm Shafayet
Qidi Luo
Xiang Bai
Yuliang Liu
175
0
0
21 May 2024
Previous
1
2
3
4
5
...
12
13
14
Next