Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1601.07140
Cited By
COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images
26 January 2016
Andreas Veit
Tomas Matera
Lukás Neumann
Jirí Matas
Serge J. Belongie
Re-assign community
ArXiv
PDF
HTML
Papers citing
"COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images"
11 / 11 papers shown
Title
M2-omni: Advancing Omni-MLLM for Comprehensive Modality Support with Competitive Performance
Qingpei Guo
Kaiyou Song
Zipeng Feng
Ziping Ma
Qinglong Zhang
...
Yunxiao Sun
Tai-WeiChang
Jingdong Chen
Ming Yang
Jun Zhou
MLLM
VLM
54
3
0
26 Feb 2025
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
25
25
0
10 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
41
25
0
04 Oct 2024
Out of Length Text Recognition with Sub-String Matching
Yongkun Du
Zhineng Chen
Caiyan Jia
Xieping Gao
Yu-Gang Jiang
20
2
0
17 Jul 2024
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Zejun Li
Ruipu Luo
Jiwen Zhang
Minghui Qiu
Zhongyu Wei
Zhongyu Wei
LRM
MLLM
25
6
0
27 May 2024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
Weifeng Lin
Xinyu Wei
Ruichuan An
Peng Gao
Bocheng Zou
Yulin Luo
Siyuan Huang
Shanghang Zhang
Hongsheng Li
VLM
27
31
0
29 Mar 2024
CoTBal: Comprehensive Task Balancing for Multi-Task Visual Instruction Tuning
Yanqi Dai
Dong Jing
Nanyi Fei
Zhiwu Lu
Nanyi Fei
Guoxing Yang
Zhiwu Lu
25
2
0
07 Mar 2024
IPAD: Iterative, Parallel, and Diffusion-based Network for Scene Text Recognition
Xiaomeng Yang
Zhi Qiao
Yu Zhou
DiffM
12
1
0
19 Dec 2023
Domain Adaptive Scene Text Detection via Subcategorization
Zichen Tian
Chuhui Xue
Jingyi Zhang
Shijian Lu
15
3
0
01 Dec 2022
MIDV-500: A Dataset for Identity Documents Analysis and Recognition on Mobile Devices in Video Stream
V. Arlazarov
K. Bulatov
T. Chernov
Vladimir L. Arlazarov
8
108
0
16 Jul 2018
Single Shot Text Detector with Regional Attention
Pan He
Weilin Huang
Tong He
Qile Zhu
Yu Qiao
Xiaolin Li
VLM
10
296
0
01 Sep 2017
1