ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.04252
  4. Cited By
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

6 February 2024
Quan-Sen Sun
Jinsheng Wang
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Xinlong Wang
    VLMCLIPMLLM
ArXiv (abs)PDFHTMLHuggingFace (29 upvotes)

Papers citing "EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters"

25 / 25 papers shown
Title
EBind: a practical approach to space binding
EBind: a practical approach to space binding
Jim Broadbent
Felix Cohen
Frederik Hvilshøj
Eric Landau
Eren Sasoglu
136
0
0
18 Nov 2025
A Parameter-Efficient Mixture-of-Experts Framework for Cross-Modal Geo-Localization
A Parameter-Efficient Mixture-of-Experts Framework for Cross-Modal Geo-Localization
Linfeng Li
Jian-jun Zhao
Zepeng Yang
Yuhang Song
Bojun Lin
Tianle Zhang
Yuchen Yuan
C. Zhang
Xuelong Li
MoE
148
0
0
23 Oct 2025
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
UniME-V2: MLLM-as-a-Judge for Universal Multimodal Embedding Learning
Tiancheng Gu
Kaicheng Yang
Kaichen Zhang
Xiang An
Ziyong Feng
Y. Zhang
Weidong Cai
Jiankang Deng
Lidong Bing
183
4
0
15 Oct 2025
RangeSAM: On the Potential of Visual Foundation Models for Range-View represented LiDAR segmentation
RangeSAM: On the Potential of Visual Foundation Models for Range-View represented LiDAR segmentation
Paul Julius Kühn
Duc Anh Nguyen
Arjan Kuijper
Holger Graf
Dieter W. Fellner
3DPC
225
0
0
19 Sep 2025
Category-level Text-to-Image Retrieval Improved: Bridging the Domain Gap with Diffusion Models and Vision Encoders
Category-level Text-to-Image Retrieval Improved: Bridging the Domain Gap with Diffusion Models and Vision Encoders
Faizan Farooq Khan
Vladan Stojnić
Zakaria Laskar
Mohamed Elhoseiny
Giorgos Tolias
DiffMVLM
79
0
0
29 Aug 2025
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
MobileViCLIP: An Efficient Video-Text Model for Mobile Devices
Min Yang
Zihan Jia
Zhilin Dai
Sheng Guo
Limin Wang
CLIPVLM
132
0
0
10 Aug 2025
Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment
Guiding Cross-Modal Representations with MLLM Priors via Preference Alignment
Pengfei Zhao
Rongbo Luan
Wei Zhang
Peng Wu
Sifeng He
195
1
0
08 Jun 2025
Rapid Urban Visibility Hotspots: Quantifying Building Vertex Visibility from Connected Vehicle Trajectories using Spatial Indexing
Rapid Urban Visibility Hotspots: Quantifying Building Vertex Visibility from Connected Vehicle Trajectories using Spatial Indexing
Artur Grigorev
Adriana-Simona Mihaita
238
2
0
03 Jun 2025
mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation
mRAG: Elucidating the Design Space of Multi-modal Retrieval-Augmented Generation
Chan-wei Hu
Yueqi Wang
Shuo Xing
Chia-Ju Chen
Zhengzhong Tu
Ryan Rossi
Zhengzhong Tu
3DV
270
2
0
29 May 2025
Spa-VLM: Stealthy Poisoning Attacks on RAG-based VLM
Spa-VLM: Stealthy Poisoning Attacks on RAG-based VLM
Lei Yu
Yechao Zhang
Ziqi Zhou
Yang Wu
Wei Wan
Minghui Li
Shengshan Hu
Pei Xiaobing
Jing Wang
AAML
152
2
0
28 May 2025
Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining
Breaking the Batch Barrier (B3) of Contrastive Learning via Smart Batch Mining
Raghuveer Thirukovalluru
Rui Meng
Wenshu Fan
Karthikeyan K
Mingyi Su
Ping Nie
Semih Yavuz
Yingbo Zhou
Lei Ma
Bhuwan Dhingra
261
11
0
16 May 2025
Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization
Simple yet Effective Semi-supervised Knowledge Distillation from Vision-Language Models via Dual-Head Optimization
Seongjae Kang
Dong Bok Lee
Hyungjoon Jang
Sung Ju Hwang
VLM
368
1
0
12 May 2025
OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal Retrieval
OMGM: Orchestrate Multiple Granularities and Modalities for Efficient Multimodal RetrievalAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Wei Yang
Jingjing Fu
Rongpin Wang
Jinyu Wang
Lei Song
Jiang Bian
268
4
0
10 May 2025
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
Dengyang Jiang
Mengmeng Wang
Liuzhuozheng Li
Lei Zhang
Haoyu Wang
Wei Wei
Guang Dai
Yanning Zhang
Jingdong Wang
DiffM
386
11
0
05 May 2025
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
Breaking the Modality Barrier: Universal Embedding Learning with Multimodal LLMs
Tiancheng Gu
Kaicheng Yang
Ziyong Feng
Xingjun Wang
Yanzhao Zhang
Dingkun Long
Yingda Chen
Weidong Cai
Jiankang Deng
VLM
833
34
0
24 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjDVOS
568
90
0
17 Apr 2025
MMKB-RAG: A Multi-Modal Knowledge-Based Retrieval-Augmented Generation Framework
MMKB-RAG: A Multi-Modal Knowledge-Based Retrieval-Augmented Generation Framework
Zihan Ling
Zhiyao Guo
Yixuan Huang
Yi An
Shuai Xiao
Jinsong Lan
Xiaoyong Zhu
Bo Zheng
RALMVLM
343
5
0
14 Apr 2025
Mind the (Data) Gap: Evaluating Vision Systems in Small Data Applications
Mind the (Data) Gap: Evaluating Vision Systems in Small Data Applications
Samuel Stevens
S M Rayeed
Jenna Kline
VLM
115
2
0
08 Apr 2025
FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs
FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs
Carlos Plou
Cesar Borja
Ruben Martinez-Cantin
Ana C. Murillo
258
0
0
25 Mar 2025
HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility Evaluator
HEIE: MLLM-Based Hierarchical Explainable AIGC Image Implausibility EvaluatorComputer Vision and Pattern Recognition (CVPR), 2024
Fan Yang
Ru Zhen
Jinqiao Wang
Yanhao Zhang
Haoxiang Chen
Haonan Lu
Sicheng Zhao
Guiguang Ding
392
7
0
26 Nov 2024
Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography
Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in MammographyInformation Processing in Medical Imaging (IPMI), 2024
Yuexi Du
John Onofrey
Nicha Dvornek
VLM
186
7
0
26 Sep 2024
CanvOI, an Oncology Intelligence Foundation Model: Scaling FLOPS
  Differently
CanvOI, an Oncology Intelligence Foundation Model: Scaling FLOPS Differently
Jonathan Zalach
Inbal Gazy
Assaf Avinoam
Ron Sinai
Eran Shmuel
Inbar Gilboa
Christine Swisher
Naim Matasci
Reva Basho
David B. Agus
137
0
0
04 Sep 2024
Scaling White-Box Transformers for Vision
Scaling White-Box Transformers for Vision
Jinrui Yang
Xianhang Li
Druv Pai
Yuyin Zhou
Yi-An Ma
Yaodong Yu
Cihang Xie
ViT
476
13
0
30 May 2024
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Ziyang Wang
Shoubin Yu
Elias Stengel-Eskin
Jaehong Yoon
Feng Cheng
Gedas Bertasius
Mohit Bansal
397
140
0
29 May 2024
Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language
  Pre-training Models
Omniview-Tuning: Boosting Viewpoint Invariance of Vision-Language Pre-training Models
Shouwei Ruan
Yinpeng Dong
Hanqing Liu
Yao Huang
Hang Su
Xingxing Wei
VLM
284
3
0
18 Apr 2024
1