Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.19652
Cited By
VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization
30 April 2024
Yuliang Liu
Mingxin Huang
Hao Yan
Linger Deng
Weijia Wu
Hao Lu
Chunhua Shen
Lianwen Jin
Xiang Bai
Re-assign community
ArXiv
PDF
HTML
Papers citing
"VimTS: A Unified Video and Image Text Spotter for Enhancing the Cross-domain Generalization"
4 / 4 papers shown
Title
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Jiaqi Wang
VLM
MLLM
73
242
0
29 Jan 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
135
895
0
21 Dec 2023
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
116
367
0
07 Nov 2023
DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR
Shilong Liu
Feng Li
Hao Zhang
X. Yang
Xianbiao Qi
Hang Su
Jun Zhu
Lei Zhang
ViT
132
703
0
28 Jan 2022
1