Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2205.01917
Cited By
v1
v2 (latest)
CoCa: Contrastive Captioners are Image-Text Foundation Models
4 May 2022
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Papers citing
"CoCa: Contrastive Captioners are Image-Text Foundation Models"
50 / 1,042 papers shown
From Isolated Islands to Pangea: Unifying Semantic Space for Human Action Understanding
Computer Vision and Pattern Recognition (CVPR), 2023
Yong-Lu Li
Xiaoqian Wu
Xinpeng Liu
Zehao Wang
Yiming Dou
...
Junyi Zhang
Yixing Li
Jingru Tan
Xudong Lu
Cewu Lu
470
19
0
02 Apr 2023
SoftCLIP: Softer Cross-modal Alignment Makes CLIP Stronger
AAAI Conference on Artificial Intelligence (AAAI), 2023
Yuting Gao
Jinfeng Liu
Zi-Han Xu
Tong Wu
Wen Liu
Jie Yang
Keren Li
Xingen Sun
CLIP
VLM
212
75
0
30 Mar 2023
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Lucas Beyer
Bo Wan
Gagan Madan
Filip Pavetić
Andreas Steiner
...
Emanuele Bugliarello
Tianlin Li
Qihang Yu
Liang-Chieh Chen
Xiaohua Zhai
246
9
0
30 Mar 2023
AutoAD: Movie Description in Context
Computer Vision and Pattern Recognition (CVPR), 2023
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
256
49
0
29 Mar 2023
Physics-Driven Diffusion Models for Impact Sound Synthesis from Videos
Computer Vision and Pattern Recognition (CVPR), 2023
Kun Su
Kaizhi Qian
Eli Shlizerman
Antonio Torralba
Chuang Gan
VGen
AI4CE
304
29
0
29 Mar 2023
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Weicheng Kuo
A. Piergiovanni
Dahun Kim
Xiyang Luo
Benjamin Caine
...
Luowei Zhou
Andrew M. Dai
Zhifeng Chen
Claire Cui
A. Angelova
MLLM
VLM
382
30
0
29 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
IEEE International Conference on Computer Vision (ICCV), 2023
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
532
238
0
28 Mar 2023
CoRe-Sleep: A Multimodal Fusion Framework for Time Series Robust to Imperfect Modalities
IEEE transactions on neural systems and rehabilitation engineering (IEEE TNSRE), 2023
Konstantinos Kontras
Christos Chatzichristos
Huy P Phan
Johan A. K. Suykens
Marina De Vos
AI4TS
194
21
0
27 Mar 2023
IRFL: Image Recognition of Figurative Language
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Ron Yosef
Yonatan Bitton
Dafna Shahaf
344
28
0
27 Mar 2023
Sigmoid Loss for Language Image Pre-Training
IEEE International Conference on Computer Vision (ICCV), 2023
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
1.8K
2,232
0
27 Mar 2023
Zero-Shot Composed Image Retrieval with Textual Inversion
IEEE International Conference on Computer Vision (ICCV), 2023
Alberto Baldrati
Lorenzo Agnolucci
Marco Bertini
Marco Bertini
278
166
0
27 Mar 2023
Equivariant Similarity for Vision-Language Foundation Models
IEEE International Conference on Computer Vision (ICCV), 2023
Tan Wang
Kevin Qinghong Lin
Linjie Li
Chung-Ching Lin
Zhengyuan Yang
Hanwang Zhang
Zicheng Liu
Lijuan Wang
CoGe
281
63
0
25 Mar 2023
VILA: Learning Image Aesthetics from User Comments with Vision-Language Pretraining
Computer Vision and Pattern Recognition (CVPR), 2023
Junjie Ke
Keren Ye
Jiahui Yu
Yonghui Wu
P. Milanfar
Feng Yang
VLM
254
81
0
24 Mar 2023
Accelerating Vision-Language Pretraining with Free Language Modeling
Computer Vision and Pattern Recognition (CVPR), 2023
Teng Wang
Yixiao Ge
Feng Zheng
Ran Cheng
Ying Shan
Xiaohu Qie
Ping Luo
VLM
MLLM
176
11
0
24 Mar 2023
The effectiveness of MAE pre-pretraining for billion-scale pretraining
IEEE International Conference on Computer Vision (ICCV), 2023
Mannat Singh
Quentin Duval
Kalyan Vasudev Alwala
Haoqi Fan
Vaibhav Aggarwal
...
Piotr Dollár
Christoph Feichtenhofer
Ross B. Girshick
Rohit Girdhar
Ishan Misra
LRM
377
86
0
23 Mar 2023
CoBIT: A Contrastive Bi-directional Image-Text Generation Model
International Conference on Learning Representations (ICLR), 2023
Haoxuan You
Mandy Guo
Zhecan Wang
Kai-Wei Chang
Jason Baldridge
Jiahui Yu
DiffM
210
14
0
23 Mar 2023
Weakly Supervised Video Representation Learning with Unaligned Text for Sequential Videos
Computer Vision and Pattern Recognition (CVPR), 2023
Sixun Dong
Huazhang Hu
Dongze Lian
Weixin Luo
Yichen Qian
Shenghua Gao
ViT
AI4TS
278
18
0
22 Mar 2023
MAGVLT: Masked Generative Vision-and-Language Transformer
Computer Vision and Pattern Recognition (CVPR), 2023
Sungwoong Kim
DaeJin Jo
Donghoon Lee
Jongmin Kim
VLM
129
16
0
21 Mar 2023
VideoXum: Cross-modal Visual and Textural Summarization of Videos
IEEE transactions on multimedia (IEEE TMM), 2023
Jingyang Lin
Hang Hua
Ming Chen
Yikang Li
Jenhao Hsiao
C. Ho
Jiebo Luo
381
50
0
21 Mar 2023
eP-ALM: Efficient Perceptual Augmentation of Language Models
IEEE International Conference on Computer Vision (ICCV), 2023
Mustafa Shukor
Corentin Dancette
Matthieu Cord
MLLM
VLM
417
34
0
20 Mar 2023
EVA-02: A Visual Representation for Neon Genesis
Image and Vision Computing (IVC), 2023
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
ViT
CLIP
400
409
0
20 Mar 2023
A Region-Prompted Adapter Tuning for Visual Abductive Reasoning
ACM Multimedia (ACM MM), 2023
Hao Zhang
Yeo Keat Ee
Basura Fernando
VLM
403
3
0
18 Mar 2023
IRGen: Generative Modeling for Image Retrieval
European Conference on Computer Vision (ECCV), 2023
Yidan Zhang
Ting Zhang
Dong Chen
Yujing Wang
Qi Chen
...
Tao Gui
Fan Yang
Mao Yang
Q. Liao
B. Guo
3DV
VLM
325
21
0
17 Mar 2023
Investigating the Role of Attribute Context in Vision-Language Models for Object Recognition and Detection
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Kyle Buettner
Adriana Kovashka
210
0
0
17 Mar 2023
Unified Visual Relationship Detection with Vision and Language Models
IEEE International Conference on Computer Vision (ICCV), 2023
Long Zhao
Liangzhe Yuan
Boqing Gong
Huayu Chen
Florian Schroff
Ming-Hsuan Yang
Hartwig Adam
Ting Liu
ObjD
293
12
0
16 Mar 2023
Cross-Modal Causal Intervention for Medical Report Generation
IEEE Transactions on Image Processing (IEEE TIP), 2023
Weixing Chen
Yang-Yang Liu
Ce Wang
Jiarui Zhu
Shen Zhao
Guanbin Li
Cheng-Lin Liu
329
7
0
16 Mar 2023
Lana: A Language-Capable Navigator for Instruction Following and Generation
Computer Vision and Pattern Recognition (CVPR), 2023
Xiaohan Wang
Wenguan Wang
Jiayi Shao
Yi Yang
LLMAG
LM&Ro
237
56
0
15 Mar 2023
Architext: Language-Driven Generative Architecture Design
Theodoros Galanos
Antonios Liapis
Georgios N. Yannakakis
VLM
AI4CE
292
7
0
13 Mar 2023
Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need
International Journal of Computer Vision (IJCV), 2023
Da-Wei Zhou
Han-Jia Ye
De-Chuan Zhan
Ziwei Liu
CLL
235
168
0
13 Mar 2023
Breaking Common Sense: WHOOPS! A Vision-and-Language Benchmark of Synthetic and Compositional Images
IEEE International Conference on Computer Vision (ICCV), 2023
Nitzan Bitton-Guetta
Yonatan Bitton
Jack Hessel
Ludwig Schmidt
Yuval Elovici
Gabriel Stanovsky
Roy Schwartz
VLM
460
87
0
13 Mar 2023
Scaling Vision-Language Models with Sparse Mixture of Experts
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Sheng Shen
Z. Yao
Chunyuan Li
Trevor Darrell
Kurt Keutzer
Yuxiong He
VLM
MoE
326
98
0
13 Mar 2023
ViM: Vision Middleware for Unified Downstream Transferring
IEEE International Conference on Computer Vision (ICCV), 2023
Yutong Feng
Biao Gong
Jianwen Jiang
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
233
2
0
13 Mar 2023
Multi-metrics adaptively identifies backdoors in Federated learning
IEEE International Conference on Computer Vision (ICCV), 2023
Siquan Huang
Yijiang Li
Chong Chen
Leyu Shi
Ying Gao
AAML
262
44
0
12 Mar 2023
Multimodal Data Integration for Oncology in the Era of Deep Neural Networks: A Review
Asim Waqas
Aakash Tripathi
Ravichandran Ramachandran
Paul Stewart
Ghulam Rasool
AI4CE
480
81
0
11 Mar 2023
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Bang-ju Yang
Fenglin Liu
Yuexian Zou
Xian Wu
Yaowei Wang
David Clifton
243
12
0
11 Mar 2023
Tag2Text: Guiding Vision-Language Model via Image Tagging
International Conference on Learning Representations (ICLR), 2023
Xinyu Huang
Youcai Zhang
Jinyu Ma
Weiwei Tian
Rui Feng
Yuejie Zhang
Yaqian Li
Yandong Guo
Lei Zhang
CLIP
MLLM
VLM
3DV
412
97
0
10 Mar 2023
Refined Vision-Language Modeling for Fine-grained Multi-modal Pre-training
Lisai Zhang
Qingcai Chen
Zhijian Chen
Yunpeng Han
Zhonghua Li
Bo Zhao
VLM
142
1
0
09 Mar 2023
Interpretable Visual Question Answering Referring to Outside Knowledge
International Conference on Information Photonics (ICIP), 2023
He Zhu
Ren Togo
Takahiro Ogawa
Miki Haseyama
138
1
0
08 Mar 2023
Your representations are in the network: composable and parallel adaptation for large scale models
Neural Information Processing Systems (NeurIPS), 2023
Yonatan Dukler
Alessandro Achille
Hao Yang
Varsha Vivek
Luca Zancato
Benjamin Bowman
Avinash Ravichandran
Charless C. Fowlkes
A. Swaminathan
Stefano Soatto
297
3
0
07 Mar 2023
iBall: Augmenting Basketball Videos with Gaze-moderated Embedded Visualizations
International Conference on Human Factors in Computing Systems (CHI), 2023
Zhutian Chen
Qisen Yang
Jiarui Shan
Tica Lin
Johanna Beyer
Haijun Xia
Hanspeter Pfister
291
37
0
06 Mar 2023
DeCap: Decoding CLIP Latents for Zero-Shot Captioning via Text-Only Training
International Conference on Learning Representations (ICLR), 2023
Wei Li
Linchao Zhu
Longyin Wen
Yi Yang
VLM
229
119
0
06 Mar 2023
Prismer: A Vision-Language Model with Multi-Task Experts
Shikun Liu
Linxi Fan
Edward Johns
Zhiding Yu
Chaowei Xiao
Anima Anandkumar
VLM
MLLM
315
33
0
04 Mar 2023
FAME-ViL: Multi-Tasking Vision-Language Model for Heterogeneous Fashion Tasks
Computer Vision and Pattern Recognition (CVPR), 2023
Xiaoping Han
Xiatian Zhu
Licheng Yu
Li Zhang
Yi-Zhe Song
Tao Xiang
VLM
179
64
0
04 Mar 2023
Fine-Grained ImageNet Classification in the Wild
Maria Lymperaiou
Konstantinos Thomas
Giorgos Stamou
VLM
157
1
0
04 Mar 2023
Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!
International Conference on Learning Representations (ICLR), 2023
Shiwei Liu
Tianlong Chen
Zhenyu Zhang
Xuxi Chen
Tianjin Huang
Ajay Jaiswal
Zinan Lin
210
31
0
03 Mar 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
450
19
0
03 Mar 2023
Visual Atoms: Pre-training Vision Transformers with Sinusoidal Waves
Computer Vision and Pattern Recognition (CVPR), 2023
Sora Takashima
Ryo Hayamizu
Nakamasa Inoue
Hirokatsu Kataoka
Rio Yokota
233
25
0
02 Mar 2023
Aligning benchmark datasets for table structure recognition
IEEE International Conference on Document Analysis and Recognition (ICDAR), 2023
B. Smock
Rohith Pesala
Robin Abraham
LMTD
216
19
0
01 Mar 2023
On the Importance of Feature Representation for Flood Mapping using Classical Machine Learning Approaches
Kevin Iselborn
Marco Stricker
T. Miyamoto
Marlon Nuske
Andreas Dengel
AI4CE
142
1
0
01 Mar 2023
Rethinking Efficient Tuning Methods from a Unified Perspective
Zeyinzi Jiang
Chaojie Mao
Ziyuan Huang
Yiliang Lv
Deli Zhao
Jingren Zhou
231
15
0
01 Mar 2023
Previous
1
2
3
...
16
17
18
19
20
21
Next