Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2205.01917
Cited By
v1
v2 (latest)
CoCa: Contrastive Captioners are Image-Text Foundation Models
4 May 2022
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Papers citing
"CoCa: Contrastive Captioners are Image-Text Foundation Models"
50 / 1,041 papers shown
Title
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency
International Conference on Learning Representations (ICLR), 2023
Tianhong Li
Sangnie Bhardwaj
Yonglong Tian
Han Zhang
Jarred Barber
Dina Katabi
Guillaume Lajoie
Huiwen Chang
Dilip Krishnan
VLM
248
7
0
05 Oct 2023
On the Cognition of Visual Question Answering Models and Human Intelligence: A Comparative Study
Liben Chen
Long Chen
Tian Ellison-Chen
Zhuoyuan Xu
LRM
98
0
0
04 Oct 2023
Towards reporting bias in visual-language datasets: bimodal augmentation by decoupling object-attribute association
Qiyu Wu
Mengjie Zhao
Yutong He
Lang Huang
Junya Ono
Hiromi Wakaki
Yuki Mitsufuji
272
6
0
02 Oct 2023
Large Scale Masked Autoencoding for Reducing Label Requirements on SAR Data
Matt Allen
Francisco Dorr
Joseph A. Gallego-Mejia
Laura Martínez-Ferrer
Anna Jungbluth
F. Kalaitzis
Raúl Ramos-Pollán
258
10
0
02 Oct 2023
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants
Tianyu Yu
Jinyi Hu
Yuan Yao
Haoye Zhang
Yue Zhao
...
Jiao Xue
Dahai Li
Zhiyuan Liu
Hai-Tao Zheng
Maosong Sun
VLM
MLLM
144
23
0
01 Oct 2023
Beyond Task Performance: Evaluating and Reducing the Flaws of Large Multimodal Models with In-Context Learning
International Conference on Learning Representations (ICLR), 2023
Mustafa Shukor
Alexandre Ramé
Corentin Dancette
Matthieu Cord
LRM
MLLM
372
26
0
01 Oct 2023
Knowledge Engineering using Large Language Models
Bradley Paul Allen
Lise Stork
Paul T. Groth
232
28
0
01 Oct 2023
InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists
International Conference on Learning Representations (ICLR), 2023
Yulu Gan
Sungwoo Park
Alexander Schubert
Anthony Philippakis
Ahmed Alaa
VLM
248
30
0
30 Sep 2023
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
229
22
0
28 Sep 2023
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Sanghwan Kim
Hao Tang
Fisher Yu
VLM
CLIP
213
5
0
28 Sep 2023
Parameter-Saving Adversarial Training: Reinforcing Multi-Perturbation Robustness via Hypernetworks
Huihui Gong
Minjing Dong
Siqi Ma
S. Çamtepe
Surya Nepal
Chang Xu
AAML
OOD
166
1
0
28 Sep 2023
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
Computer Vision and Pattern Recognition (CVPR), 2023
Ruyang Liu
Chen Li
Yixiao Ge
Ying Shan
Thomas H. Li
Ge Li
183
39
0
27 Sep 2023
Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features
British Machine Vision Conference (BMVC), 2023
Hila Levi
Guy Heller
Dan Levi
Ethan Fetaya
OCL
VLM
184
4
0
26 Sep 2023
CWCL: Cross-Modal Transfer with Continuously Weighted Contrastive Loss
Neural Information Processing Systems (NeurIPS), 2023
R. S. Srinivasa
Jaejin Cho
Chouchang Yang
Yashas Malur Saidutta
Ching Hua Lee
Yilin Shen
Hongxia Jin
VLM
180
15
0
26 Sep 2023
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Z. Yao
Xiaoxia Wu
Conglong Li
Minjia Zhang
Heyang Qi
Olatunji Ruwase
A. A. Awan
Samyam Rajbhandari
Yuxiong He
230
11
0
25 Sep 2023
Beyond Grids: Exploring Elastic Input Sampling for Vision Transformers
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Adam Pardyl
Grzegorz Kurzejamski
Jan Olszewski
Tomasz Trzciñski
Bartosz Zieliñski
145
4
0
23 Sep 2023
TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance
IEEE International Conference on Computer Vision (ICCV), 2023
Kan Wu
Houwen Peng
Zhenghong Zhou
Bin Xiao
Xiyang Dai
...
Xi
Xi Chen
Xinggang Wang
Hongyang Chao
Han Hu
VLM
OODD
212
96
0
21 Sep 2023
ContextRef: Evaluating Referenceless Metrics For Image Description Generation
International Conference on Learning Representations (ICLR), 2023
Elisa Kreiss
E. Zelikman
Christopher Potts
Nick Haber
225
5
0
21 Sep 2023
DreamLLM: Synergistic Multimodal Comprehension and Creation
International Conference on Learning Representations (ICLR), 2023
Runpei Dong
Chunrui Han
Yuang Peng
Zekun Qi
Zheng Ge
...
Hao-Ran Wei
Xiangwen Kong
Xiangyu Zhang
Kaisheng Ma
Li Yi
MLLM
270
271
0
20 Sep 2023
In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
IEEE International Conference on Computer Vision (ICCV), 2023
Nina Shvetsova
Anna Kukleva
Bernt Schiele
Hilde Kuehne
DiffM
198
6
0
16 Sep 2023
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
IEEE International Conference on Computer Vision (ICCV), 2023
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
200
31
0
14 Sep 2023
Improving Multimodal Classification of Social Media Posts by Leveraging Image-Text Auxiliary Tasks
Findings (Findings), 2023
Danae Sánchez Villegas
Daniel Preoctiuc-Pietro
Nikolaos Aletras
182
4
0
14 Sep 2023
PRE: Vision-Language Prompt Learning with Reparameterization Encoder
Anh Pham Thi Minh
An Duc Nguyen
Georgios Tzimiropoulos
VPVLM
VLM
212
3
0
14 Sep 2023
Sight Beyond Text: Multi-Modal Training Enhances LLMs in Truthfulness and Ethics
Haoqin Tu
Bingchen Zhao
Chen Wei
Cihang Xie
MLLM
127
19
0
13 Sep 2023
Language Models as Black-Box Optimizers for Vision-Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Shihong Liu
Zhiqiu Lin
Samuel Yu
Ryan Lee
Tiffany Ling
Deepak Pathak
Deva Ramanan
VLM
329
39
0
12 Sep 2023
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization
International Conference on Learning Representations (ICLR), 2023
Yang Jin
Kun Xu
Kun Xu
Liwei Chen
Chao Liao
...
Xiaoqiang Lei
Chen Zhang
Wenwu Ou
Kun Gai
Yadong Mu
MLLM
VLM
221
75
0
09 Sep 2023
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Computer Vision and Pattern Recognition (CVPR), 2023
Zigang Geng
Binxin Yang
Tiankai Hang
Chen Li
Shuyang Gu
...
Jianmin Bao
Zheng Zhang
Han Hu
DongDong Chen
Baining Guo
DiffM
VLM
249
156
0
07 Sep 2023
Dual Relation Alignment for Composed Image Retrieval
Xintong Jiang
Yaxiong Wang
Yujiao Wu
Ming Wang
Xueming Qian
179
8
0
05 Sep 2023
ExMobileViT: Lightweight Classifier Extension for Mobile Vision Transformer
Gyeongdong Yang
Yungwook Kwon
Hyunjin Kim
ViT
94
2
0
04 Sep 2023
BDC-Adapter: Brownian Distance Covariance for Better Vision-Language Reasoning
British Machine Vision Conference (BMVC), 2023
Yi Zhang
Ce Zhang
Zihan Liao
Yushun Tang
Zhihai He
BDL
VLM
259
11
0
03 Sep 2023
Contrastive Feature Masking Open-Vocabulary Vision Transformer
IEEE International Conference on Computer Vision (ICCV), 2023
Dahun Kim
A. Angelova
Weicheng Kuo
ObjD
VLM
294
37
0
02 Sep 2023
ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation
IEEE International Conference on Computer Vision (ICCV), 2023
Weihan Wang
Zhiyong Yang
Bin Xu
Juanzi Li
Yankui Sun
VLM
265
9
0
31 Aug 2023
Do the Frankenstein, or how to achieve better out-of-distribution performance with manifold mixing model soup
Hannes Fassold
MoMe
UQCV
107
2
0
28 Aug 2023
Fine-tuning can cripple your foundation model; preserving features may be the solution
Jishnu Mukhoti
Y. Gal
Juil Sock
P. Dokania
CLL
356
67
0
25 Aug 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLM
VLM
ObjD
489
1,530
0
24 Aug 2023
DLIP: Distilling Language-Image Pre-training
Huafeng Kuang
Jie Wu
Xiawu Zheng
Ming Li
Xuefeng Xiao
Rui Wang
Min Zheng
Rongrong Ji
VLM
128
6
0
24 Aug 2023
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
International Conference on Learning Representations (ICLR), 2023
Jinyi Hu
Yuan Yao
Chong Wang
Shanonan Wang
Yinxu Pan
...
Yankai Lin
Jiao Xue
Dahai Li
Zhiyuan Liu
Maosong Sun
MLLM
VLM
272
75
0
23 Aug 2023
Local Distortion Aware Efficient Transformer Adaptation for Image Quality Assessment
Kangmin Xu
Liang Liao
Jing Xiao
Chaofeng Chen
Haoning Wu
Qiong Yan
Weisi Lin
ViT
142
9
0
23 Aug 2023
Unsupervised Prototype Adapter for Vision-Language Models
Chinese Conference on Pattern Recognition and Computer Vision (CPRCV), 2023
Yi Zhang
Ce Zhang
Xue-mei Hu
Z. He
VLM
208
8
0
22 Aug 2023
GrowCLIP: Data-aware Automatic Model Growing for Large-scale Contrastive Language-Image Pre-training
IEEE International Conference on Computer Vision (ICCV), 2023
Xi Deng
Han Shi
Runhu Huang
Changlin Li
Hang Xu
Jianhua Han
James T. Kwok
Shen Zhao
Wei Zhang
Xiaodan Liang
CLIP
VLM
186
3
0
22 Aug 2023
Foundation Model-oriented Robustness: Robust Image Model Evaluation with Pretrained Models
International Conference on Learning Representations (ICLR), 2023
Peiyan Zhang
Hao Liu
Chaozhuo Li
Xing Xie
Sunghun Kim
Haohan Wang
VLM
OOD
296
9
0
21 Aug 2023
Dataset Quantization
IEEE International Conference on Computer Vision (ICCV), 2023
Daquan Zhou
Kaixin Wang
Jianyang Gu
Xiang Peng
Dongze Lian
Yifan Zhang
Yang You
Jiashi Feng
DD
163
58
0
21 Aug 2023
Generic Attention-model Explainability by Weighted Relevance Accumulation
ACM Multimedia Asia (MA), 2023
Yiming Huang
Ao Jia
Xiaodan Zhang
Jiawei Zhang
122
4
0
20 Aug 2023
ViT-Lens: Initiating Omni-Modal Exploration through 3D Insights
Weixian Lei
Yixiao Ge
Jianfeng Zhang
Dylan Sun
Kun Yi
Ying Shan
Mike Zheng Shou
140
1
0
20 Aug 2023
TDG: Text-guided Domain Generalization
Geng-Ming Liu
Yuxi Wang
OOD
246
4
0
19 Aug 2023
Towards Grounded Visual Spatial Reasoning in Multi-Modal Vision Language Models
Navid Rajabi
Jana Kosecka
VLM
255
17
0
18 Aug 2023
Artificial-Spiking Hierarchical Networks for Vision-Language Representation Learning
Ye-Ting Chen
Siyu Zhang
Yaoru Sun
Weijian Liang
Haoran Wang
148
3
0
18 Aug 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
494
7
0
18 Aug 2023
Prompt Switch: Efficient CLIP Adaptation for Text-Video Retrieval
IEEE International Conference on Computer Vision (ICCV), 2023
Chaorui Deng
Qi Chen
Pengda Qin
Dave Zhenyu Chen
Qi Wu
VLM
CLIP
190
44
0
15 Aug 2023
CTP: Towards Vision-Language Continual Pretraining via Compatible Momentum Contrast and Topology Preservation
IEEE International Conference on Computer Vision (ICCV), 2023
Hongguang Zhu
Yunchao Wei
Xiaodan Liang
Chunjie Zhang
Yao-Min Zhao
VLM
127
35
0
14 Aug 2023
Previous
1
2
3
...
12
13
14
...
19
20
21
Next