ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1604.06646
  4. Cited By
Synthetic Data for Text Localisation in Natural Images

Synthetic Data for Text Localisation in Natural Images

22 April 2016
Ankush Gupta
Andrea Vedaldi
Andrew Zisserman
ArXiv (abs)PDFHTML

Papers citing "Synthetic Data for Text Localisation in Natural Images"

50 / 607 papers shown
Title
Global-Local Aware Scene Text Editing
Global-Local Aware Scene Text EditingIEEE International Conference on Multimedia and Expo (ICME), 2025
Fuxiang Yang
Tonghua Su
Donglin Di
Yin Chen
Xiangqian Wu
Zhongjie Wang
Lei Fan
DiffM
138
0
0
03 Dec 2025
MDiff4STR: Mask Diffusion Model for Scene Text Recognition
MDiff4STR: Mask Diffusion Model for Scene Text Recognition
Yongkun Du
Miaomiao Zhao
S. Fan
Z. Chen
Caiyan Jia
Yu-Gang Jiang
DiffM
32
0
0
01 Dec 2025
Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding
Bharat Scene Text: A Novel Comprehensive Dataset and Benchmark for Indian Language Scene Text Understanding
Anik De
A. S. Penamakuri
Rajeev Yadav
Aditya Rathore
Harshiv Shah
Devesh Sharma
Sagar Agarwal
Pravin Kumar
Anand Mishra
108
0
0
28 Nov 2025
Evaluating Multimodal Large Language Models on Vertically Written Japanese Text
Evaluating Multimodal Large Language Models on Vertically Written Japanese Text
Keito Sasagawa
Shuhei Kurita
Daisuke Kawahara
68
0
0
19 Nov 2025
BackWeak: Backdooring Knowledge Distillation Simply with Weak Triggers and Fine-tuning
BackWeak: Backdooring Knowledge Distillation Simply with Weak Triggers and Fine-tuning
Shanmin Wang
Dongdong Zhao
AAML
191
0
0
15 Nov 2025
PerspAct: Enhancing LLM Situated Collaboration Skills through Perspective Taking and Active Vision
PerspAct: Enhancing LLM Situated Collaboration Skills through Perspective Taking and Active VisionPortuguese Conference on Artificial Intelligence (EPIA), 2025
Sabrina Patania
Luca Annese
Anita Pellegrini
Silvia Serino
Anna Lambiase
...
Silvia Rossi
Simone Colombani
Tom Foulsham
Azzurra Ruggeri
Dimitri Ognibene
LLMAG
192
3
0
11 Nov 2025
OmniText: A Training-Free Generalist for Controllable Text-Image Manipulation
OmniText: A Training-Free Generalist for Controllable Text-Image Manipulation
Agus Gunawan
Samuel Teodoro
Yun Chen
Soo Ye Kim
Jihyong Oh
Munchurl Kim
DiffM
275
0
0
28 Oct 2025
GranViT: A Fine-Grained Vision Model With Autoregressive Perception For MLLMs
GranViT: A Fine-Grained Vision Model With Autoregressive Perception For MLLMs
Guanghao Zheng
Bowen Shi
Mingxing Xu
Ruoyu Sun
Peisen Zhao
...
Wenrui Dai
Junni Zou
Hongkai Xiong
Xiaopeng Zhang
Qi Tian
VLM
147
0
0
23 Oct 2025
ClapperText: A Benchmark for Text Recognition in Low-Resource Archival Documents
ClapperText: A Benchmark for Text Recognition in Low-Resource Archival Documents
Tingyu Lin
Marco Peer
Florian Kleber
Robert Sablatnig
100
0
0
17 Oct 2025
A Large-scale Dataset for Robust Complex Anime Scene Text Detection
A Large-scale Dataset for Robust Complex Anime Scene Text Detection
Ziyi Dong
Yurui Zhang
Changmao Li
Naomi Rue Golding
Qing Long
64
0
0
09 Oct 2025
OTR: Synthesizing Overlay Text Dataset for Text Removal
OTR: Synthesizing Overlay Text Dataset for Text Removal
Jan Zdenek
Wataru Shimoda
Kota Yamaguchi
72
0
0
03 Oct 2025
Automatic Intermodal Loading Unit Identification using Computer Vision: A Scoping Review
Automatic Intermodal Loading Unit Identification using Computer Vision: A Scoping Review
Emre Gülsoylu
Alhassan Abdelhalim
Derya Kara Boztas
Ole Grasse
Carlos Jahn
Simone Frintrop
Janick Edinger
104
0
0
22 Sep 2025
Qianfan-VL: Domain-Enhanced Universal Vision-Language Models
Qianfan-VL: Domain-Enhanced Universal Vision-Language Models
Daxiang Dong
Mingming Zheng
Dong Xu
Bairong Zhuang
W. Zhang
...
Ruchang Yao
Ziye Yuan
J. Wu
Guangjun Xie
Dou Shen
VLM
79
1
0
19 Sep 2025
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe
MiniCPM-V 4.5: Cooking Efficient MLLMs via Architecture, Data, and Training Recipe
Tianyu Yu
Zefan Wang
Chongyi Wang
Fuwei Huang
Wenshuo Ma
...
Ning Ding
Xu Han
Xingtai Lv
Zhiyuan Liu
Maosong Sun
MLLMVLM
158
20
0
16 Sep 2025
Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR
Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR
Shashank Vempati
Nishit Anand
Gaurav Talebailkar
Arpan Garai
Chetan Arora
78
0
0
29 Aug 2025
Training Kindai OCR with parallel textline images and self-attention feature distance-based loss
Training Kindai OCR with parallel textline images and self-attention feature distance-based loss
A. D. Le
A. Kitamoto
64
1
0
12 Aug 2025
TRUDI and TITUS: A Multi-Perspective Dataset and A Three-Stage Recognition System for Transportation Unit Identification
TRUDI and TITUS: A Multi-Perspective Dataset and A Three-Stage Recognition System for Transportation Unit Identification
Emre Gülsoylu
A. Kelm
Lennart Bengtson
Matthias Hirsch
Christian Wilms
Tim Rolff
Janick Edinger
Simone Frintrop
103
0
0
04 Aug 2025
TEACH: Text Encoding as Curriculum Hints for Scene Text Recognition
TEACH: Text Encoding as Curriculum Hints for Scene Text Recognition
Xiahan Yang
Hui Zheng
VLM
90
1
0
02 Aug 2025
Hyper-Local Deformable Transformers for Text Spotting on Historical Maps
Hyper-Local Deformable Transformers for Text Spotting on Historical MapsKnowledge Discovery and Data Mining (KDD), 2024
Yijun Lin
Yao-Yi Chiang
116
7
0
17 Jun 2025
MSTAR: Box-free Multi-query Scene Text Retrieval with Attention Recycling
MSTAR: Box-free Multi-query Scene Text Retrieval with Attention Recycling
Liang Yin
Xudong Xie
Zhang Li
Xiang Bai
Yuliang Liu
LRM
282
0
0
12 Jun 2025
Text-Aware Image Restoration with Diffusion Models
Text-Aware Image Restoration with Diffusion Models
Jaewon Min
J. Kim
Paul Hyunbin Cho
J. Lee
Jihye Park
Minkyu Park
S. Kim
Hyunhee Park
Seungryong Kim
274
1
0
11 Jun 2025
TextSR: Diffusion Super-Resolution with Multilingual OCR Guidance
TextSR: Diffusion Super-Resolution with Multilingual OCR Guidance
Keren Ye
Ignacio Garcia Dorado
Michalis Raptis
M. Delbracio
Irene Zhu
P. Milanfar
Hossein Talebi
245
1
0
29 May 2025
Syn3DTxt: Embedding 3D Cues for Scene Text Generation
Syn3DTxt: Embedding 3D Cues for Scene Text Generation
Li-Syun Hsiung
Jun-Kai Tu
Kuan-Wu Chu
Yu-Hsuan Chiu
Yan-Tsung Peng
Sheng-Luen Chung
Gee-Sern Jison Hsu
165
0
0
24 May 2025
The Devil is in Fine-tuning and Long-tailed Problems:A New Benchmark for Scene Text Detection
The Devil is in Fine-tuning and Long-tailed Problems:A New Benchmark for Scene Text DetectionInternational Joint Conference on Artificial Intelligence (IJCAI), 2025
Tianjiao Cao
Jiahao Lyu
Weichao Zeng
Weimin Mu
Can Ma
279
1
0
21 May 2025
LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?
LogicOCR: Do Your Large Multimodal Models Excel at Logical Reasoning on Text-Rich Images?
Maoyuan Ye
Jing Zhang
Juhua Liu
Bo Du
Dacheng Tao
Bo Du
LRM
510
1
0
18 May 2025
PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language
PsOCR: Benchmarking Large Multimodal Models for Optical Character Recognition in Low-resource Pashto Language
Ijazul Haq
Yingjie Zhang
Irfan Ali Khan
300
0
0
15 May 2025
Joint Low-level and High-level Textual Representation Learning with Multiple Masking Strategies
Joint Low-level and High-level Textual Representation Learning with Multiple Masking Strategies
Zhengmi Tang
Yuto Mitsui
Tomo Miyazaki
S. Omachi
276
0
0
11 May 2025
Visual Text Processing: A Comprehensive Review and Unified Evaluation
Visual Text Processing: A Comprehensive Review and Unified Evaluation
Yan Shu
Weichao Zeng
Fangmin Zhao
Zeyu Chen
Zhiyu Li
...
Paolo Rota
Xiang Bai
Lianwen Jin
Xu-Cheng Yin
Andrii Zadaianchuk
CoGe
420
6
0
30 Apr 2025
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Skip-Vision: Efficient and Scalable Acceleration of Vision-Language Models via Adaptive Token Skipping
Weili Zeng
Ziyuan Huang
Kaixiang Ji
Manwen Liao
VLM
590
4
0
26 Mar 2025
From Fragment to One Piece: A Survey on AI-Driven Graphic Design
From Fragment to One Piece: A Survey on AI-Driven Graphic Design
Xingxing Zou
Wen Zhang
Nanxuan Zhao
320
3
0
24 Mar 2025
Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition
Linguistics-aware Masked Image Modeling for Self-supervised Scene Text RecognitionComputer Vision and Pattern Recognition (CVPR), 2025
Yifei Zhang
Yu Xie
Jin Wei
Xiaomeng Yang
Can Ma
Can Ma
Xiangyang Ji
288
7
0
24 Mar 2025
A Context-Driven Training-Free Network for Lightweight Scene Text Segmentation and Recognition
A Context-Driven Training-Free Network for Lightweight Scene Text Segmentation and Recognition
Ritabrata Chakraborty
Shivakumara Palaiahnakote
Umapada Pal
Cheng-Lin Liu
VLM
258
1
0
19 Mar 2025
Marten: Visual Question Answering with Mask Generation for Multi-modal Document Understanding
Marten: Visual Question Answering with Mask Generation for Multi-modal Document UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Zining Wang
Tongkun Guan
Pei Fu
Chen Duan
Qianyi Jiang
Zhentao Guo
Shan Guo
Junfeng Luo
Wei Shen
Yunbo Wang
MLLMVLM
220
7
0
18 Mar 2025
Scale Efficient Training for Large Datasets
Scale Efficient Training for Large DatasetsComputer Vision and Pattern Recognition (CVPR), 2025
Qing Zhou
Junyu Gao
Qi Wang
DD
320
3
0
17 Mar 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLMVLM
546
12
0
26 Feb 2025
Megrez-Omni Technical Report
Boxun Li
Yadong Li
Hui Yuan
Congyi Liu
Weilin Liu
...
Dong Zhou
Yueqing Zhuang
Shengen Yan
Guohao Dai
Longji Xu
211
1
0
19 Feb 2025
PLATTER: A Page-Level Handwritten Text Recognition System for Indic Scripts
Badri Vishal Kasuba
Dhruv Kudale
Venkatapathy Subramanian
P. Chaudhuri
Ganesh Ramakrishnan
278
1
0
10 Feb 2025
Beyond Flat Text: Dual Self-inherited Guidance for Visual Text Generation
Beyond Flat Text: Dual Self-inherited Guidance for Visual Text Generation
Minxing Luo
Zixun Xia
L. Chen
Zhenhang Li
Weichao Zeng
Jinqiao Wang
Wentao Cheng
Yaxing Wang
Can Ma
Jian Yang
DiffM
281
1
0
10 Jan 2025
SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Jiawei Liu
Yuanzhi Zhu
Feiyu Gao
Zhiyong Yang
P. Wang
Junyang Lin
Xinyu Wang
Wenyu Liu
DiffM
313
0
0
08 Jan 2025
Instruction-Guided Scene Text Recognition
Instruction-Guided Scene Text RecognitionIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Yongkun Du
Z. Chen
Yuchen Su
Caiyan Jia
Yu-Gang Jiang
469
17
0
03 Jan 2025
TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
Xingsong Ye
Yongkun Du
Yunbo Tao
Z. Chen
DiffM
417
2
0
02 Dec 2024
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
Yongkun Du
Z. Chen
Hongtao Xie
Caiyan Jia
Yu-Gang Jiang
375
18
0
24 Nov 2024
Boosting Semi-Supervised Scene Text Recognition via Viewing and
  Summarizing
Boosting Semi-Supervised Scene Text Recognition via Viewing and SummarizingNeural Information Processing Systems (NeurIPS), 2024
Yadong Qu
Yuxin Wang
Bangbang Zhou
Zihan Wang
Hongtao Xie
Yongdong Zhang
220
2
0
23 Nov 2024
Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition
T. Lin
Jinglei Zhang
Yi Xu
Kai Chen
Rui Zhang
Chong Chen
315
0
0
18 Nov 2024
Real-Time Text Detection with Similar Mask in Traffic, Industrial, and
  Natural Scenes
Real-Time Text Detection with Similar Mask in Traffic, Industrial, and Natural Scenes
Xu Han
Junyu Gao
Chuang Yang
Yuan Yuan
Qi Wang
189
5
0
05 Nov 2024
High-Fidelity Document Stain Removal via A Large-Scale Real-World
  Dataset and A Memory-Augmented Transformer
High-Fidelity Document Stain Removal via A Large-Scale Real-World Dataset and A Memory-Augmented TransformerIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Mingxian Li
Hao Sun
Yingtie Lei
Xiaofeng Zhang
Yihang Dong
Yilin Zhou
Zimeng Li
Xuhang Chen
295
32
0
30 Oct 2024
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5%
  Parameters and 90% Performance
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Zhangwei Gao
Zhe Chen
Erfei Cui
Yiming Ren
Weiyun Wang
...
Lewei Lu
Tong Lu
Yu Qiao
Jifeng Dai
Wenhai Wang
VLM
375
84
0
21 Oct 2024
TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control
TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance ControlNeural Information Processing Systems (NeurIPS), 2024
Weichao Zeng
Yan Shu
Zhenhang Li
Dongbao Yang
Can Ma
DiffM
228
23
0
14 Oct 2024
CodeSCAN: ScreenCast ANalysis for Video Programming Tutorials
CodeSCAN: ScreenCast ANalysis for Video Programming Tutorials
Alexander Naumann
Felix Hertlein
Jacqueline Höllig
Lucas Cazzonelli
Steffen Thoma
100
1
0
27 Sep 2024
AI-Powered Augmented Reality for Satellite Assembly, Integration and
  Test
AI-Powered Augmented Reality for Satellite Assembly, Integration and Test
Alvaro Patricio
Joao Valente
Atabak Dehban
Ines Cadilha
Daniel Reis
Rodrigo Ventura
117
3
0
26 Sep 2024
1234...111213
Next