Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.04252
Cited By
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
6 February 2024
Quan-Sen Sun
Jinsheng Wang
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Xinlong Wang
VLM
CLIP
MLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters"
7 / 7 papers shown
Title
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
D. Jiang
Mengmeng Wang
Liuzhuozheng Li
Lei Zhang
Haoyu Wang
Wei Wei
Guang Dai
Yanning Zhang
Jingdong Wang
DiffM
35
0
0
05 May 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
98
0
0
17 Apr 2025
OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance
Yongqiang Yao
Jingru Tan
Jiahao Hu
Feizhao Zhang
Xin Jin
...
Ruihao Gong
Pengfei Liu
Pengfei Liu
Dahua Lin
Ningyi Xu
VLM
30
1
0
30 Jul 2024
On Efficient Language and Vision Assistants for Visually-Situated Natural Language Understanding: What Matters in Reading and Reasoning
Geewook Kim
Minjoon Seo
VLM
14
2
0
17 Jun 2024
Scaling White-Box Transformers for Vision
Jinrui Yang
Xianhang Li
Druv Pai
Yuyin Zhou
Yi-An Ma
Yaodong Yu
Cihang Xie
ViT
30
9
0
30 May 2024
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
126
895
0
21 Dec 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
1