ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1507.05717
  4. Cited By
An End-to-End Trainable Neural Network for Image-based Sequence
  Recognition and Its Application to Scene Text Recognition

An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2015
21 July 2015
Baoguang Shi
X. Bai
Cong Yao
    VLM
ArXiv (abs)PDFHTML

Papers citing "An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition"

50 / 678 papers shown
Title
HunyuanOCR Technical Report
HunyuanOCR Technical Report
Hunyuan Vision Team
Pengyuan Lyu
Xingyu Wan
Gengluo Li
Shangpin Peng
...
Jianchen Zhu
Jie Jiang
Linus
Han Hu
Chengquan Zhang
MLLMVLM
438
0
0
24 Nov 2025
TopoReformer: Mitigating Adversarial Attacks Using Topological Purification in OCR Models
Bhagyesh Kumar
A S Aravinthakashan
Akshat Satyanarayan
Ishaan Gakhar
Ujjwal Verma
AAML
72
0
0
19 Nov 2025
OTSNet: A Neurocognitive-Inspired Observation-Thinking-Spelling Pipeline for Scene Text Recognition
OTSNet: A Neurocognitive-Inspired Observation-Thinking-Spelling Pipeline for Scene Text Recognition
Lixu Sun
Nurmemet Yolwas
Wushour Silamu
42
0
0
11 Nov 2025
SilhouetteTell: Practical Video Identification Leveraging Blurred Recordings of Video Subtitles
SilhouetteTell: Practical Video Identification Leveraging Blurred Recordings of Video Subtitles
Guanchong Huang
Song Fang
71
0
0
31 Oct 2025
Efficient License Plate Recognition via Pseudo-Labeled Supervision with Grounding DINO and YOLOv8
Efficient License Plate Recognition via Pseudo-Labeled Supervision with Grounding DINO and YOLOv8International Workshop on Machine Learning for Signal Processing (MLSP), 2025
Zahra Ebrahimi Vargoorani
Amir Mohammad Ghoreyshi
Ching Yee Suen
53
0
0
28 Oct 2025
Tibetan Language and AI: A Comprehensive Survey of Resources, Methods and Challenges
Tibetan Language and AI: A Comprehensive Survey of Resources, Methods and Challenges
Cheng Huang
Nyima Tashi
Fan Gao
Yutong Liu
J. Li
...
Guojie Tang
Xiangxiang Wang
Jia Zhang
Tsengdar J. Lee
Yongbin Yu
96
0
0
22 Oct 2025
CharDiff-LP: A Diffusion Model with Character-Level Guidance for License Plate Image Restoration
CharDiff-LP: A Diffusion Model with Character-Level Guidance for License Plate Image Restoration
Gyuhwan Park
Kihyun Na
Injung Kim
DiffM
88
0
0
20 Oct 2025
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding
Scaling Beyond Context: A Survey of Multimodal Retrieval-Augmented Generation for Document Understanding
Sensen Gao
Shanshan Zhao
Xu Jiang
Lunhao Duan
Yong Xien Chng
Qing-Guo Chen
Weihua Luo
Kaifu Zhang
Jia-Wang Bian
Mingming Gong
186
0
0
17 Oct 2025
ClapperText: A Benchmark for Text Recognition in Low-Resource Archival Documents
ClapperText: A Benchmark for Text Recognition in Low-Resource Archival Documents
Tingyu Lin
Marco Peer
Florian Kleber
Robert Sablatnig
96
0
0
17 Oct 2025
UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy
UniCalli: A Unified Diffusion Framework for Column-Level Generation and Recognition of Chinese Calligraphy
Tianshuo Xu
Kai Wang
Zhifei Chen
Leyi Wu
Tianshui Wen
Fei Chao
Ying-Cong Chen
DiffM
56
0
0
15 Oct 2025
Exploring OCR-augmented Generation for Bilingual VQA
Exploring OCR-augmented Generation for Bilingual VQA
JoonHo Lee
Sunho Park
VLM
84
0
0
02 Oct 2025
LadderMoE: Ladder-Side Mixture of Experts Adapters for Bronze Inscription Recognition
LadderMoE: Ladder-Side Mixture of Experts Adapters for Bronze Inscription Recognition
Rixin Zhou
Peiqiang Qiu
Qian Zhang
Chuntao Li
Xi Yang
104
0
0
02 Oct 2025
From Videos to Indexed Knowledge Graphs -- Framework to Marry Methods for Multimodal Content Analysis and Understanding
From Videos to Indexed Knowledge Graphs -- Framework to Marry Methods for Multimodal Content Analysis and Understanding
Basem Rizk
Joel Walsh
Mark Core
Benjamin Nye
76
0
0
01 Oct 2025
Automatic Intermodal Loading Unit Identification using Computer Vision: A Scoping Review
Automatic Intermodal Loading Unit Identification using Computer Vision: A Scoping Review
Emre Gülsoylu
Alhassan Abdelhalim
Derya Kara Boztas
Ole Grasse
Carlos Jahn
Simone Frintrop
Janick Edinger
92
0
0
22 Sep 2025
GraDeT-HTR: A Resource-Efficient Bengali Handwritten Text Recognition System utilizing Grapheme-based Tokenizer and Decoder-only Transformer
GraDeT-HTR: A Resource-Efficient Bengali Handwritten Text Recognition System utilizing Grapheme-based Tokenizer and Decoder-only Transformer
Md. Mahmudul Hasan
Ahmed Nesar Tahsin Choudhury
Mahmudul Hasan
Md. Mosaddek Khan
104
0
0
22 Sep 2025
Emotion-Aware Speech Generation with Character-Specific Voices for Comics
Emotion-Aware Speech Generation with Character-Specific Voices for Comics
Zhiwen Qian
Jinhua Liang
Huan Zhang
85
0
0
18 Sep 2025
Exploring Light-Weight Object Recognition for Real-Time Document Detection
Exploring Light-Weight Object Recognition for Real-Time Document Detection
Lucas Wojcik
Luiz Coelho
Roger Granada
David Menotti
164
0
0
07 Sep 2025
The Telephone Game: Evaluating Semantic Drift in Unified Models
The Telephone Game: Evaluating Semantic Drift in Unified Models
Sabbir Mollah
Rohit Gupta
S. Swetha
Qingyang Liu
Ahnaf Munir
Mubarak Shah
VLM
115
1
0
04 Sep 2025
Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR
Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR
Shashank Vempati
Nishit Anand
Gaurav Talebailkar
Arpan Garai
Chetan Arora
62
0
0
29 Aug 2025
Quo Vadis Handwritten Text Generation for Handwritten Text Recognition?
Quo Vadis Handwritten Text Generation for Handwritten Text Recognition?
Vittorio Pippi
Konstantina Nikolaidou
S. Cascianelli
George Retsinas
Giorgos Sfikas
Rita Cucchiara
Marcus Liwicki
DiffM
87
1
0
13 Aug 2025
Enhanced Generative Structure Prior for Chinese Text Image Super-resolution
Enhanced Generative Structure Prior for Chinese Text Image Super-resolutionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2025
Xiaoming Li
Wangmeng Zuo
Chen Change Loy
143
0
0
11 Aug 2025
TEACH: Text Encoding as Curriculum Hints for Scene Text Recognition
TEACH: Text Encoding as Curriculum Hints for Scene Text Recognition
Xiahan Yang
Hui Zheng
VLM
90
1
0
02 Aug 2025
LPTR-AFLNet: Lightweight Integrated Chinese License Plate Rectification and Recognition Network
LPTR-AFLNet: Lightweight Integrated Chinese License Plate Rectification and Recognition Network
Guangzhu Xu
Pengcheng Zuo
Zhi Ke
Bangjun Lei
155
0
0
22 Jul 2025
Hyper-Local Deformable Transformers for Text Spotting on Historical Maps
Hyper-Local Deformable Transformers for Text Spotting on Historical MapsKnowledge Discovery and Data Mining (KDD), 2024
Yijun Lin
Yao-Yi Chiang
92
7
0
17 Jun 2025
Text-Aware Image Restoration with Diffusion Models
Text-Aware Image Restoration with Diffusion Models
Jaewon Min
J. Kim
Paul Hyunbin Cho
J. Lee
Jihye Park
Minkyu Park
S. Kim
Hyunhee Park
Seungryong Kim
250
1
0
11 Jun 2025
Learning to Align: Addressing Character Frequency Distribution Shifts in Handwritten Text Recognition
Learning to Align: Addressing Character Frequency Distribution Shifts in Handwritten Text Recognition
Panagiotis Kaliosis
John Pavlopoulos
185
1
0
11 Jun 2025
Task-driven real-world super-resolution of document scans
Task-driven real-world super-resolution of document scansApplied Sciences (AS), 2025
Maciej Zyrek
Tomasz Tarasiewicz
Jakub Sadel
Aleksandra Krzywon
M. Kawulok
SupR
93
0
0
08 Jun 2025
Sign Language: Towards Sign Understanding for Robot Autonomy
Sign Language: Towards Sign Understanding for Robot Autonomy
Ayush Agrawal
Joel Loo
Nicky Zimmerman
David Hsu
SLR
201
0
0
03 Jun 2025
TextSR: Diffusion Super-Resolution with Multilingual OCR Guidance
TextSR: Diffusion Super-Resolution with Multilingual OCR Guidance
Keren Ye
Ignacio Garcia Dorado
Michalis Raptis
M. Delbracio
Irene Zhu
P. Milanfar
Hossein Talebi
209
1
0
29 May 2025
WriteViT: Handwritten Text Generation with Vision Transformer
WriteViT: Handwritten Text Generation with Vision Transformer
Dang Hoai Nam
Huynh Tong Dang Khoa
Vo Nguyen Le Duy
ViT
185
0
0
19 May 2025
Revisiting SSL for sound event detection: complementary fusion and adaptive post-processing
Revisiting SSL for sound event detection: complementary fusion and adaptive post-processingJournal of King Saud University: Computer and Information Sciences (J. King Saud Univ. Comput. Inf. Sci.), 2025
Hanfang Cui
Longfei Song
Li Li
Dongxing Xu
Yanhua Long
295
0
0
17 May 2025
DOTA: Deformable Optimized Transformer Architecture for End-to-End Text Recognition with Retrieval-Augmented Generation
DOTA: Deformable Optimized Transformer Architecture for End-to-End Text Recognition with Retrieval-Augmented GenerationInternational Conference on Knowledge and Smart Technology (ICKST), 2025
Naphat Nithisopa
Teerapong Panboonyuen
ViT
186
0
0
07 May 2025
Visual Text Processing: A Comprehensive Review and Unified Evaluation
Visual Text Processing: A Comprehensive Review and Unified Evaluation
Yan Shu
Weichao Zeng
Fangmin Zhao
Zeyu Chen
Zhiyu Li
...
Paolo Rota
Xiang Bai
Lianwen Jin
Xu-Cheng Yin
Andrii Zadaianchuk
CoGe
380
6
0
30 Apr 2025
Single Document Image Highlight Removal via A Large-Scale Real-World Dataset and A Location-Aware Network
Single Document Image Highlight Removal via A Large-Scale Real-World Dataset and A Location-Aware Network
Lu Pan
Yu-Hsuan Huang
Hongxia Xie
Cheng Zhang
H Zhao
Hong-Han Shuai
Wen-Huang Cheng
115
0
0
19 Apr 2025
ViMo: A Generative Visual GUI World Model for App Agents
ViMo: A Generative Visual GUI World Model for App Agents
Dezhao Luo
Bohan Tang
Kang Li
Georgios Papoudakis
Jifei Song
S. Gong
Haifeng Zhang
Jun Wang
Cheng Deng
LM&RoVGen
474
3
0
15 Apr 2025
Enabling Collaborative Parametric Knowledge Calibration for Retrieval-Augmented Vision Question Answering
Enabling Collaborative Parametric Knowledge Calibration for Retrieval-Augmented Vision Question Answering
Jiaqi Deng
Kaize Shi
Zonghan Wu
Huan Huo
Dingxian Wang
Guandong Xu
165
0
0
05 Apr 2025
Learning Phase Distortion with Selective State Space Models for Video Turbulence Mitigation
Learning Phase Distortion with Selective State Space Models for Video Turbulence MitigationComputer Vision and Pattern Recognition (CVPR), 2025
Xingguang Zhang
Nicholas Chimitt
Xijun Wang
Yu Yuan
Stanley H. Chan
337
2
0
03 Apr 2025
Leveraging Contrast Information for Efficient Document Shadow Removal
Leveraging Contrast Information for Efficient Document Shadow Removal
Wenshu Fan
Jiancheng Huang
Na Liu
Mingfu Yan
Yi Huang
Shifeng Chen
297
1
0
01 Apr 2025
NCAP: Scene Text Image Super-Resolution with Non-CAtegorical Prior
NCAP: Scene Text Image Super-Resolution with Non-CAtegorical PriorIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2025
Dongwoo Park
Suk Pil Ko
890
0
0
01 Apr 2025
Improving Applicability of Deep Learning based Token Classification models during Training
Improving Applicability of Deep Learning based Token Classification models during Training
Anket Mehra
Malte Prieß
Marian Himstedt
253
0
0
28 Mar 2025
Practical Fine-Tuning of Autoregressive Models on Limited Handwritten Texts
Practical Fine-Tuning of Autoregressive Models on Limited Handwritten TextsIEEE International Conference on Document Analysis and Recognition (ICDAR), 2025
Jan Kohút
Michal Hradiš
285
0
0
25 Mar 2025
Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition
Linguistics-aware Masked Image Modeling for Self-supervised Scene Text RecognitionComputer Vision and Pattern Recognition (CVPR), 2025
Yifei Zhang
Yu Xie
Jin Wei
Xiaomeng Yang
Yu Zhou
Can Ma
Xiangyang Ji
260
7
0
24 Mar 2025
Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation
Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-DistillationComputer Vision and Pattern Recognition (CVPR), 2025
Andrea Maracani
Savas Ozkan
Sijun Cho
Hyowon Kim
Eunchung Noh
Jeongwon Min
Cho Jung Min
Dookun Park
Mete Ozay
350
1
0
20 Mar 2025
A Context-Driven Training-Free Network for Lightweight Scene Text Segmentation and Recognition
A Context-Driven Training-Free Network for Lightweight Scene Text Segmentation and Recognition
Ritabrata Chakraborty
Shivakumara Palaiahnakote
Umapada Pal
Cheng-Lin Liu
VLM
238
1
0
19 Mar 2025
Historic Scripts to Modern Vision: A Novel Dataset and A VLM Framework for Transliteration of Modi Script to Devanagari
Historic Scripts to Modern Vision: A Novel Dataset and A VLM Framework for Transliteration of Modi Script to DevanagariIEEE International Conference on Document Analysis and Recognition (ICDAR), 2025
Harshal Kausadikar
Tanvi Kale
Onkar Susladkar
Sparsh Mittal
256
0
0
17 Mar 2025
TextDoctor: Unified Document Image Inpainting via Patch Pyramid Diffusion Models
Wanglong Lu
Lingming Su
Jingjing Zheng
Vinícius Veloso de Melo
Farzaneh Shoeleh
J. Hawkin
T. Tricco
Hanli Zhao
Xianta Jiang
DiffM
248
2
0
06 Mar 2025
MathMistake Checker: A Comprehensive Demonstration for Step-by-Step Math Problem Mistake Finding by Prompt-Guided LLMs
MathMistake Checker: A Comprehensive Demonstration for Step-by-Step Math Problem Mistake Finding by Prompt-Guided LLMsAAAI Conference on Artificial Intelligence (AAAI), 2025
T. Zhang
Zhuoxuan Jiang
Haotian Zhang
Lin Lin
Shaohua Zhang
LRM
246
1
0
06 Mar 2025
DashCop: Automated E-ticket Generation for Two-Wheeler Traffic Violations Using Dashcam VideosIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2025
Deepti Rawat
Keshav Gupta
Aryamaan Basu Roy
Ravi Kiran Sarvadevabhatla
259
1
0
01 Mar 2025
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
OmniParser V2: Structured-Points-of-Thought for Unified Visual Text Parsing and Its Generality to Multimodal Large Language Models
Wenwen Yu
Zhibo Yang
Jianqiang Wan
Sibo Song
J. Tang
Wenqing Cheng
Yunxing Liu
Xiang Bai
299
13
0
22 Feb 2025
Handwritten Text Recognition: A Survey
Handwritten Text Recognition: A Survey
Carlos Garrido-Munoz
Antonio Ríos-Vila
Jorge Calvo-Zaragoza
267
5
0
12 Feb 2025
1234...121314
Next