ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.15389
  4. Cited By
EVA-CLIP: Improved Training Techniques for CLIP at Scale

EVA-CLIP: Improved Training Techniques for CLIP at Scale

27 March 2023
Quan-Sen Sun
Yuxin Fang
Ledell Yu Wu
Xinlong Wang
Yue Cao
    CLIP
    VLM
ArXivPDFHTML

Papers citing "EVA-CLIP: Improved Training Techniques for CLIP at Scale"

50 / 357 papers shown
Title
It's Just Another Day: Unique Video Captioning by Discriminative
  Prompting
It's Just Another Day: Unique Video Captioning by Discriminative Prompting
Toby Perrett
Tengda Han
Dima Damen
Andrew Zisserman
19
3
0
15 Oct 2024
Browsing without Third-Party Cookies: What Do You See?
Browsing without Third-Party Cookies: What Do You See?
Maxwell Lin
Shihan Lin
Helen Wu
Karen Wang
Xiaowei Yang
BDL
51
7
0
14 Oct 2024
LG-CAV: Train Any Concept Activation Vector with Language Guidance
LG-CAV: Train Any Concept Activation Vector with Language Guidance
Qihan Huang
Jie Song
Mengqi Xue
H. Zhang
Bingde Hu
Huiqiong Wang
Hao Jiang
Xingen Wang
Mingli Song
VLM
19
0
0
14 Oct 2024
big.LITTLE Vision Transformer for Efficient Visual Recognition
big.LITTLE Vision Transformer for Efficient Visual Recognition
He Guo
Yulong Wang
Zixuan Ye
Jifeng Dai
Yuwen Xiong
ViT
50
0
0
14 Oct 2024
TULIP: Token-length Upgraded CLIP
TULIP: Token-length Upgraded CLIP
Ivona Najdenkoska
Mohammad Mahdi Derakhshani
Yuki M. Asano
N. V. Noord
Marcel Worring
Cees G. M. Snoek
VLM
43
3
0
13 Oct 2024
Foundation Model-Powered 3D Few-Shot Class Incremental Learning via
  Training-free Adaptor
Foundation Model-Powered 3D Few-Shot Class Incremental Learning via Training-free Adaptor
Sahar Ahmadi
A. Cheraghian
Morteza Saberi
Md. Towsif Abir
Hamidreza Dastmalchi
Farookh Hussain
Shafin Rahman
3DPC
26
2
0
11 Oct 2024
MiRAGeNews: Multimodal Realistic AI-Generated News Detection
MiRAGeNews: Multimodal Realistic AI-Generated News Detection
Runsheng Huang
Liam Dugan
Y. Yang
Chris Callison-Burch
23
3
0
11 Oct 2024
Deciphering Cross-Modal Alignment in Large Vision-Language Models with
  Modality Integration Rate
Deciphering Cross-Modal Alignment in Large Vision-Language Models with Modality Integration Rate
Qidong Huang
Xiaoyi Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Jiaqi Wang
Dahua Lin
Weiming Zhang
Nenghai Yu
49
5
0
09 Oct 2024
Temporal Image Caption Retrieval Competition -- Description and Results
Temporal Image Caption Retrieval Competition -- Description and Results
Jakub Pokrywka
Piotr Wierzchoñ
Kornel Weryszko
Krzysztof Jassem
31
0
0
08 Oct 2024
Training-Free Open-Ended Object Detection and Segmentation via Attention
  as Prompts
Training-Free Open-Ended Object Detection and Segmentation via Attention as Prompts
Zhiwei Lin
Yongtao Wang
Zhi Tang
ObjD
VLM
25
2
0
08 Oct 2024
Enhancing Temporal Modeling of Video LLMs via Time Gating
Enhancing Temporal Modeling of Video LLMs via Time Gating
Zi-Yuan Hu
Yiwu Zhong
Shijia Huang
M. Lyu
Liwei Wang
VLM
26
0
0
08 Oct 2024
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
Yongxin Guo
Jingyu Liu
Mingda Li
Xiaoying Tang
Qingbin Liu
Xiaoying Tang
30
14
0
08 Oct 2024
DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object
  Hallucination
DAMRO: Dive into the Attention Mechanism of LVLM to Reduce Object Hallucination
Xuan Gong
Tianshi Ming
Xinpeng Wang
Zhihua Wei
MLLM
42
10
0
06 Oct 2024
Investigating and Mitigating Object Hallucinations in Pretrained
  Vision-Language (CLIP) Models
Investigating and Mitigating Object Hallucinations in Pretrained Vision-Language (CLIP) Models
Yufang Liu
Tao Ji
Changzhi Sun
Yuanbin Wu
Aimin Zhou
VLM
MLLM
33
1
0
04 Oct 2024
Toward a Holistic Evaluation of Robustness in CLIP Models
Toward a Holistic Evaluation of Robustness in CLIP Models
Weijie Tu
Weijian Deng
Tom Gedeon
VLM
31
5
0
02 Oct 2024
Solution for OOD-CV Workshop SSB Challenge 2024 (Open-Set Recognition
  Track)
Solution for OOD-CV Workshop SSB Challenge 2024 (Open-Set Recognition Track)
Mingxu Feng
Dian Chao
Peng Zheng
Yang Yang
21
0
0
30 Sep 2024
From Seconds to Hours: Reviewing MultiModal Large Language Models on
  Comprehensive Long Video Understanding
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
Heqing Zou
Tianze Luo
Guiyang Xie
Victor
Zhang
...
Guangcong Wang
Juanyang Chen
Zhuochen Wang
Hansheng Zhang
Huaijian Zhang
VLM
34
6
0
27 Sep 2024
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
51
11
0
26 Sep 2024
Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography
Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography
Yuexi Du
John Onofrey
Nicha Dvornek
VLM
43
1
0
26 Sep 2024
VL4AD: Vision-Language Models Improve Pixel-wise Anomaly Detection
VL4AD: Vision-Language Models Improve Pixel-wise Anomaly Detection
Liangyu Zhong
Joachim Sicking
Fabian Hüger
Hanno Gottschalk
VLM
28
0
0
25 Sep 2024
MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object
  Scenarios
MM-CamObj: A Comprehensive Multimodal Dataset for Camouflaged Object Scenarios
Jiacheng Ruan
Wenzhen Yuan
Zehao Lin
Ning Liao
Zhiyu Li
Feiyu Xiong
Ting Liu
Yuzhuo Fu
41
5
0
24 Sep 2024
M$^2$PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning
M2^22PT: Multimodal Prompt Tuning for Zero-shot Instruction Learning
Taowen Wang
Yiyang Liu
James Liang
Junhan Zhao
Yiming Cui
...
Zenglin Xu
Cheng Han
Lifu Huang
Qifan Wang
Dongfang Liu
MLLM
VLM
LRM
22
15
0
24 Sep 2024
Efficient and Discriminative Image Feature Extraction for Universal
  Image Retrieval
Efficient and Discriminative Image Feature Extraction for Universal Image Retrieval
Morris Florek
David Tschirschwitz
Björn Barz
Volker Rodehorst
VLM
18
0
0
20 Sep 2024
Understanding Multimodal Hallucination with Parameter-Free
  Representation Alignment
Understanding Multimodal Hallucination with Parameter-Free Representation Alignment
Yueqian Wang
Jianxin Liang
Yuxuan Wang
Huishuai Zhang
Dongyan Zhao
39
1
0
02 Sep 2024
CogVLM2: Visual Language Models for Image and Video Understanding
CogVLM2: Visual Language Models for Image and Video Understanding
Wenyi Hong
Weihan Wang
Ming Ding
Wenmeng Yu
Qingsong Lv
...
Debing Liu
Bin Xu
Juanzi Li
Yuxiao Dong
Jie Tang
VLM
MLLM
45
87
0
29 Aug 2024
More Text, Less Point: Towards 3D Data-Efficient Point-Language
  Understanding
More Text, Less Point: Towards 3D Data-Efficient Point-Language Understanding
Yuan Tang
Xu Han
Xianzhi Li
Qiao Yu
Jinfeng Xu
Yixue Hao
Long Hu
Min Chen
30
1
0
28 Aug 2024
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders
Min Shi
Fuxiao Liu
Shihao Wang
Shijia Liao
Subhashree Radhakrishnan
...
Andrew Tao
Andrew Tao
Zhiding Yu
Guilin Liu
Guilin Liu
MLLM
23
53
0
28 Aug 2024
ParGo: Bridging Vision-Language with Partial and Global Views
ParGo: Bridging Vision-Language with Partial and Global Views
An-Lan Wang
Bin Shan
Wei Shi
Kun-Yu Lin
Xiang Fei
Guozhi Tang
Lei Liao
Jingqun Tang
Can Huang
Wei-Shi Zheng
MLLM
VLM
77
13
0
23 Aug 2024
Building and better understanding vision-language models: insights and
  future directions
Building and better understanding vision-language models: insights and future directions
Hugo Laurençon
Andrés Marafioti
Victor Sanh
Léo Tronchon
VLM
34
60
0
22 Aug 2024
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework
  for Multimodal Large Language Model
MaVEn: An Effective Multi-granularity Hybrid Visual Encoding Framework for Multimodal Large Language Model
Chaoya Jiang
Jia Hongrui
Haiyang Xu
Wei Ye
Mengfan Dong
Ming Yan
Ji Zhang
Fei Huang
Shikun Zhang
VLM
43
1
0
22 Aug 2024
EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction
  Tuning
EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning
Bohao Xing
Zitong Yu
Xin Liu
Kaishen Yuan
Qilang Ye
Weicheng Xie
Huanjing Yue
Jingyu Yang
H. Kalviainen
48
10
0
21 Aug 2024
TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer
  Learning
TDS-CLIP: Temporal Difference Side Network for Image-to-Video Transfer Learning
Bin Wang
Wenqian Wang
VLM
21
1
0
20 Aug 2024
CROME: Cross-Modal Adapters for Efficient Multimodal LLM
CROME: Cross-Modal Adapters for Efficient Multimodal LLM
Sayna Ebrahimi
Sercan Ö. Arik
Tejas Nama
Tomas Pfister
37
1
0
13 Aug 2024
Efficient Test-Time Prompt Tuning for Vision-Language Models
Efficient Test-Time Prompt Tuning for Vision-Language Models
Yuhan Zhu
Guozhen Zhang
Chen Xu
Haocheng Shen
Xiaoxin Chen
Gangshan Wu
Limin Wang
VLM
29
2
0
11 Aug 2024
VideoQA in the Era of LLMs: An Empirical Study
VideoQA in the Era of LLMs: An Empirical Study
Junbin Xiao
Nanxin Huang
Hangyu Qin
Dongyang Li
Yicong Li
...
Zhulin Tao
Jianxing Yu
Liang Lin
Tat-Seng Chua
Angela Yao
23
10
0
08 Aug 2024
SynesLM: A Unified Approach for Audio-visual Speech Recognition and
  Translation via Language Model and Synthetic Data
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data
Yichen Lu
Álvaro Huertas-García
Xuankai Chang
Hengwei Bian
Soumi Maiti
Shinji Watanabe
37
2
0
01 Aug 2024
EZSR: Event-based Zero-Shot Recognition
EZSR: Event-based Zero-Shot Recognition
Yan Yang
Sehwan Kim
Dongxu Li
Y. Sun
26
0
0
31 Jul 2024
CLEFT: Language-Image Contrastive Learning with Efficient Large Language
  Model and Prompt Fine-Tuning
CLEFT: Language-Image Contrastive Learning with Efficient Large Language Model and Prompt Fine-Tuning
Yuexi Du
Brian Chang
Nicha Dvornek
MedIm
VLM
24
2
0
30 Jul 2024
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language
  Models
GABInsight: Exploring Gender-Activity Binding Bias in Vision-Language Models
Ali Abdollahi
Mahdi Ghaznavi
Mohammad Reza Karimi Nejad
Arash Mari Oriyad
Reza Abbasi
Ali Salesi
Melika Behjati
M. Rohban
M. Baghshah
CoGe
26
1
0
30 Jul 2024
OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance
OmniBal: Towards Fast Instruct-tuning for Vision-Language Models via Omniverse Computation Balance
Yongqiang Yao
Jingru Tan
Jiahao Hu
Feizhao Zhang
Xin Jin
...
Ruihao Gong
Pengfei Liu
Pengfei Liu
Dahua Lin
Ningyi Xu
VLM
38
1
0
30 Jul 2024
Diffusion Feedback Helps CLIP See Better
Diffusion Feedback Helps CLIP See Better
Wenxuan Wang
Quan-Sen Sun
Fan Zhang
Yepeng Tang
Jing Liu
Xinlong Wang
VLM
38
14
0
29 Jul 2024
ActivityCLIP: Enhancing Group Activity Recognition by Mining
  Complementary Information from Text to Supplement Image Modality
ActivityCLIP: Enhancing Group Activity Recognition by Mining Complementary Information from Text to Supplement Image Modality
Guoliang Xu
Jianqin Yin
Feng Zhou
Yonghao Dang
VLM
33
0
0
29 Jul 2024
Exploring the Adversarial Robustness of CLIP for AI-generated Image
  Detection
Exploring the Adversarial Robustness of CLIP for AI-generated Image Detection
Vincenzo De Rosa
Fabrizio Guillaro
Giovanni Poggi
D. Cozzolino
L. Verdoliva
AAML
52
4
0
28 Jul 2024
MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
Biao Wu
Yutong Xie
Zeyu Zhang
Minh Hieu Phan
Qi Chen
Ling-Hao Chen
Qi Wu
LM&MA
32
0
0
28 Jul 2024
UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons
  of Vision Language Models
UOUO: Uncontextualized Uncommon Objects for Measuring Knowledge Horizons of Vision Language Models
Xinyu Pi
Mingyuan Wu
Jize Jiang
Haozhen Zheng
Beitong Tian
Chengxiang Zhai
Klara Nahrstedt
Zhiting Hu
VLM
28
1
0
25 Jul 2024
Unified Lexical Representation for Interpretable Visual-Language
  Alignment
Unified Lexical Representation for Interpretable Visual-Language Alignment
Yifan Li
Yikai Wang
Yanwei Fu
Dongyu Ru
Zheng-Wei Zhang
Tong He
VLM
27
3
0
25 Jul 2024
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with
  Extensive Diversity
MMInstruct: A High-Quality Multi-Modal Instruction Tuning Dataset with Extensive Diversity
Yangzhou Liu
Yue Cao
Zhangwei Gao
Weiyun Wang
Zhe Chen
...
Lewei Lu
Xizhou Zhu
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
42
22
0
22 Jul 2024
In-Context Learning Improves Compositional Understanding of
  Vision-Language Models
In-Context Learning Improves Compositional Understanding of Vision-Language Models
Matteo Nulli
Anesa Ibrahimi
Avik Pal
Hoshe Lee
Ivona Najdenkoska
VLM
CoGe
30
0
0
22 Jul 2024
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
Goldfish: Vision-Language Understanding of Arbitrarily Long Videos
Kirolos Ataallah
Xiaoqian Shen
Eslam Abdelrahman
Essam Sleiman
Mingchen Zhuge
Jian Ding
Deyao Zhu
Jürgen Schmidhuber
Mohamed Elhoseiny
VLM
17
17
0
17 Jul 2024
E5-V: Universal Embeddings with Multimodal Large Language Models
E5-V: Universal Embeddings with Multimodal Large Language Models
Ting Jiang
Minghui Song
Zihan Zhang
Haizhen Huang
Weiwei Deng
Feng Sun
Qi Zhang
Deqing Wang
Fuzhen Zhuang
VLM
23
19
0
17 Jul 2024
Previous
12345678
Next