Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1912.09641
Cited By
ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2019
20 December 2019
Xi Liu
Rui Zhang
Yongsheng Zhou
Qianyi Jiang
Qi Song
Nan Li
Kai Zhou
Lei Wang
Dong Wang
Minghui Liao
Mingkun Yang
X. Bai
Baoguang Shi
Dimosthenis Karatzas
Shijian Lu
C. V. Jawahar
3DV
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"ICDAR 2019 Robust Reading Challenge on Reading Chinese Text on Signboard"
50 / 105 papers shown
Title
LocateAnything3D: Vision-Language 3D Detection with Chain-of-Sight
Yunze Man
S. S. Wang
Guowen Zhang
Johan Bjorck
Zhiqi Li
Liang-Yan Gui
Jim Fan
Jan Kautz
Yu Wang
Zhiding Yu
89
0
0
25 Nov 2025
LRANet++: Low-Rank Approximation Network for Accurate and Efficient Text Spotting
Yuchen Su
Z. Chen
Yongkun Du
Zuxuan Wu
Hongtao Xie
Yu-Gang Jiang
60
2
0
08 Nov 2025
NVIDIA Nemotron Nano V2 VL
Nvidia
Amala Sanjay Deshmukh
Kateryna Chumachenko
Tuomas Rintamaki
Matthieu Le
...
Krzysztof Pawelec
Michael Evans
Katherine Luna
Jie Lou
Erick Galinkin
VLM
256
1
0
06 Nov 2025
From Pixels to Words -- Towards Native Vision-Language Primitives at Scale
Haiwen Diao
Mingxuan Li
Silei Wu
Linjun Dai
Xiaohua Wang
Hanming Deng
Lewei Lu
Dahua Lin
Ziwei Liu
VLM
116
0
0
16 Oct 2025
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Wenwen Tong
Hewei Guo
Dongchuan Ran
Jiangnan Chen
Jiefan Lu
...
Dinghao Zhou
Guiping Zhong
Ken Zheng
Shiyin Kang
Lewei Lu
MLLM
AuLLM
VGen
VLM
380
3
0
15 Oct 2025
Detect Anything via Next Point Prediction
Qing Jiang
Junan Huo
Xingyu Chen
Yuda Xiong
Zhaoyang Zeng
Yihao Chen
Tianhe Ren
Junzhi Yu
Lei Zhang
ObjD
191
11
0
14 Oct 2025
A Large-scale Dataset for Robust Complex Anime Scene Text Detection
Ziyi Dong
Yurui Zhang
Changmao Li
Naomi Rue Golding
Qing Long
64
0
0
09 Oct 2025
DianJin-OCR-R1: Enhancing OCR Capabilities via a Reasoning-and-Tool Interleaved Vision-Language Model
Qian Chen
Xianyin Zhang
Lifan Guo
Feng-Xiang Chen
Chi Zhang
MLLM
LRM
109
4
0
18 Aug 2025
TEACH: Text Encoding as Curriculum Hints for Scene Text Recognition
Xiahan Yang
Hui Zheng
VLM
90
1
0
02 Aug 2025
CoMemo: LVLMs Need Image Context with Image Memory
Shi-Qi Liu
Weijie Su
Xizhou Zhu
Wenhai Wang
Jifeng Dai
VLM
161
0
0
06 Jun 2025
DanceText: A Training-Free Layered Framework for Controllable Multilingual Text Transformation in Images
Zhenyu Yu
Mohd Yamani Idna Idris
Pei Wang
Yuelong Xia
Rizwan Qureshi
Shaina Raza
Aman Chadha
Yong Xiang
Zhixiang Chen
DiffM
200
1
0
18 Apr 2025
Capybara-OMNI: An Efficient Paradigm for Building Omni-Modal Language Models
Xingguang Ji
Jiakang Wang
Hongzhi Zhang
Jingyuan Zhang
Haonan Zhou
Chenxi Sun
Wenshu Fan
Qi Wang
Fuzheng Zhang
MLLM
VLM
250
1
0
10 Apr 2025
Data Metabolism: An Efficient Data Design Schema For Vision Language Model
Jingyuan Zhang
Hongzhi Zhang
Zhou Haonan
Chenxi Sun
Xingguang Ji
Jiakang Wang
Fanheng Kong
Wenshu Fan
Qi Wang
Fuzheng Zhang
VLM
349
2
0
10 Apr 2025
DocSAM: Unified Document Image Segmentation via Query Decomposition and Heterogeneous Mixed Learning
Computer Vision and Pattern Recognition (CVPR), 2025
Xiao-Hui Li
Fei Yin
Cheng-Lin Liu
245
2
0
05 Apr 2025
Accurate Scene Text Recognition with Efficient Model Scaling and Cloze Self-Distillation
Computer Vision and Pattern Recognition (CVPR), 2025
Andrea Maracani
Savas Ozkan
Sijun Cho
Hyowon Kim
Eunchung Noh
Jeongwon Min
Cho Jung Min
Dookun Park
Mete Ozay
350
1
0
20 Mar 2025
EVE: Towards End-to-End Video Subtitle Extraction with Vision-Language Models
Haiyang Yu
Jinghui Lu
Yanjie Wang
Yang Li
Han Wang
...
B. Li
Teng Fu
Yang Liu
Jun Liu
Hong Chen
VLM
205
4
0
06 Mar 2025
A Token-level Text Image Foundation Model for Document Understanding
Tongkun Guan
Zining Wang
Pei Fu
Zhengtao Guo
Wei Shen
...
Chen Duan
Hao Sun
Qianyi Jiang
Junfeng Luo
Yunbo Wang
VLM
454
4
0
04 Mar 2025
Judge a Book by its Cover: Investigating Multi-Modal LLMs for Multi-Page Handwritten Document Transcription
Benjamin Gutteridge
Matthew Thomas Jackson
Toni Kukurin
Xiaowen Dong
127
0
0
27 Feb 2025
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
490
12
0
26 Feb 2025
SceneVTG++: Controllable Multilingual Visual Text Generation in the Wild
Jiawei Liu
Yuanzhi Zhu
Feiyu Gao
Zhiyong Yang
P. Wang
Junyang Lin
Xinyu Wang
Wenyu Liu
DiffM
297
0
0
08 Jan 2025
HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language Embedding
Computer Vision and Pattern Recognition (CVPR), 2024
Chenxin Tao
Shiqian Su
X. Zhu
Chenyu Zhang
Zhe Chen
...
Wenhai Wang
Lewei Lu
Gao Huang
Yu Qiao
Jifeng Dai
MLLM
VLM
431
5
0
20 Dec 2024
SynerGen-VL: Towards Synergistic Image Understanding and Generation with Vision Experts and Token Folding
Computer Vision and Pattern Recognition (CVPR), 2024
Hao Li
Changyao Tian
Jie Shao
X. Zhu
Zhaokai Wang
...
Wenhan Dou
Xiaogang Wang
Jiaming Song
Lewei Lu
Jifeng Dai
MLLM
287
31
0
12 Dec 2024
TextSSR: Diffusion-based Data Synthesis for Scene Text Recognition
Xingsong Ye
Yongkun Du
Yunbo Tao
Z. Chen
DiffM
386
2
0
02 Dec 2024
Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance
Zhangwei Gao
Zhe Chen
Erfei Cui
Yiming Ren
Weiyun Wang
...
Lewei Lu
Tong Lu
Yu Qiao
Jifeng Dai
Wenhai Wang
VLM
339
83
0
21 Oct 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Computer Vision and Pattern Recognition (CVPR), 2024
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
321
64
0
10 Oct 2024
Text Image Generation for Low-Resource Languages with Dual Translation Learning
Chihiro Noguchi
Shun Fukuda
Shoichiro Mihara
Masao Yamanaka
DiffM
180
0
0
26 Sep 2024
Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling
European Conference on Computer Vision (ECCV), 2024
Zixiao Wang
Hongtao Xie
Yuxin Wang
Yadong Qu
Fengjun Guo
Pengwei Liu
DiffM
169
1
0
20 Sep 2024
VL-Reader: Vision and Language Reconstructor is an Effective Scene Text Recognizer
ACM Multimedia (MM), 2024
Humen Zhong
Zhibo Yang
Zhaohai Li
Peng Wang
Jun Tang
Wenqing Cheng
Cong Yao
208
4
0
18 Sep 2024
NVLM: Open Frontier-Class Multimodal LLMs
Wenliang Dai
Nayeon Lee
Wei Ping
Zhuoling Yang
Zihan Liu
Jon Barker
Tuomas Rintamaki
Mohammad Shoeybi
Bryan Catanzaro
Ming-Yu Liu
MLLM
VLM
LRM
277
111
0
17 Sep 2024
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model
Haoran Wei
Chenglong Liu
Jinyue Chen
Jia Wang
Lingyu Kong
...
Liang Zhao
Jianjian Sun
Yuang Peng
Chunrui Han
Xiangyu Zhang
VLM
159
107
0
03 Sep 2024
FastTextSpotter: A High-Efficiency Transformer for Multilingual Scene Text Spotting
International Conference on Pattern Recognition (ICPR), 2024
Alloy Das
Sanket Biswas
Umapada Pal
Josep Lladós
Saumik Bhattacharya
236
6
0
27 Aug 2024
Extremely Fine-Grained Visual Classification over Resembling Glyphs in the Wild
F. Bougourzi
Fadi Dornaika
Chongsheng Zhang
190
0
0
25 Aug 2024
Decoder Pre-Training with only Text for Scene Text Recognition
ACM Multimedia (MM), 2024
Shuai Zhao
Yongkun Du
Zhineng Chen
Yu-Gang Jiang
134
5
0
11 Aug 2024
Self-Supervised Learning for Text Recognition: A Critical Survey
International Journal of Computer Vision (IJCV), 2024
Carlos Peñarrubia
J. J. Valero-Mas
Jorge Calvo-Zaragoza
340
4
0
29 Jul 2024
Out of Length Text Recognition with Sub-String Matching
Yongkun Du
Zhineng Chen
Caiyan Jia
Xieping Gao
Yu-Gang Jiang
429
4
0
17 Jul 2024
How Control Information Influences Multilingual Text Image Generation and Editing?
Boqiang Zhang
Zuan Gao
Yadong Qu
Hongtao Xie
DiffM
262
7
0
16 Jul 2024
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output
Pan Zhang
Xiaoyi Dong
Yuhang Zang
Yuhang Cao
Rui Qian
...
Kai Chen
Jifeng Dai
Yu Qiao
Dahua Lin
Jiaqi Wang
240
169
0
03 Jul 2024
OmniCorpus: A Unified Multimodal Corpus of 10 Billion-Level Images Interleaved with Text
Qingyun Li
Zhe Chen
Weiyun Wang
Wenhai Wang
Shenglong Ye
...
Dahua Lin
Yu Qiao
Botian Shi
Conghui He
Jifeng Dai
VLM
OffRL
200
46
0
12 Jun 2024
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Tianle Gu
Zeyang Zhou
Kexin Huang
Dandan Liang
Yixu Wang
...
Keqing Wang
Yujiu Yang
Yan Teng
Botian Shi
Yingchun Wang
ELM
217
31
0
11 Jun 2024
StrucTexTv3: An Efficient Vision-Language Model for Text-rich Image Perception, Comprehension, and Beyond
Pengyuan Lyu
Yulin Li
Hao Zhou
Weihong Ma
Xingyu Wan
...
Liang Wu
Chengquan Zhang
Kun Yao
Errui Ding
Jingdong Wang
294
12
0
31 May 2024
HAAP: Vision-context Hierarchical Attention Autoregressive with Adaptive Permutation for Scene Text Recognition
Honghui Chen
Yuhang Qiu
Jiabao Wang
Pingping Chen
Nam Ling
172
0
0
15 May 2024
How Far Are We to GPT-4V? Closing the Gap to Commercial Multimodal Models with Open-Source Suites
Zhe Chen
Weiyun Wang
Hao Tian
Shenglong Ye
Zhangwei Gao
...
Tong Lu
Dahua Lin
Yu Qiao
Jifeng Dai
Wenhai Wang
MLLM
VLM
442
957
0
25 Apr 2024
InternLM-XComposer2-4KHD: A Pioneering Large Vision-Language Model Handling Resolutions from 336 Pixels to 4K HD
Neural Information Processing Systems (NeurIPS), 2024
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Xingcheng Zhang
Jifeng Dai
Yuxin Qiao
Dahua Lin
Yuan Liu
VLM
MLLM
224
159
0
09 Apr 2024
JSTR: Judgment Improves Scene Text Recognition
Masato Fujitake
206
1
0
09 Apr 2024
Bridging the Gap Between End-to-End and Two-Step Text Spotting
Mingxin Huang
Hongliang Li
Yuliang Liu
Xiang Bai
Lianwen Jin
193
10
0
06 Apr 2024
Open-Vocabulary Scene Text Recognition via Pseudo-Image Labeling and Margin Loss
Xuhua Ren
Hengcan Shi
Jin Li
VLM
206
0
0
12 Mar 2024
DeepSeek-VL: Towards Real-World Vision-Language Understanding
Haoyu Lu
Wen Liu
Bo Zhang
Bing-Li Wang
Kai Dong
...
Yaofeng Sun
Chengqi Deng
Hanwei Xu
Zhenda Xie
Chong Ruan
VLM
373
622
0
08 Mar 2024
Class-Aware Mask-Guided Feature Refinement for Scene Text Recognition
Mingkun Yang
Biao Yang
Minghui Liao
Yingying Zhu
X. Bai
VLM
190
19
0
21 Feb 2024
Dynamic Relation Transformer for Contextual Text Block Detection
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2024
Jiawei Wang
Shunchi Zhang
Kai Hu
Chixiang Ma
Zhuoyao Zhong
Lei-huan Sun
Qiang Huo
122
1
0
17 Jan 2024
SwinTextSpotter v2: Towards Better Synergy for Scene Text Spotting
International Journal of Computer Vision (IJCV), 2024
Mingxin Huang
Dezhi Peng
Hongliang Li
Zhenghao Peng
Chongyu Liu
Dahua Lin
Yuliang Liu
Xiang Bai
Lianwen Jin
311
6
0
15 Jan 2024
1
2
3
Next