Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2205.01917
Cited By
v1
v2 (latest)
CoCa: Contrastive Captioners are Image-Text Foundation Models
4 May 2022
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Papers citing
"CoCa: Contrastive Captioners are Image-Text Foundation Models"
50 / 1,042 papers shown
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Computer Vision and Pattern Recognition (CVPR), 2023
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
497
325
0
27 Feb 2023
Cross-modal Contrastive Learning for Multimodal Fake News Detection
ACM Multimedia (ACM MM), 2023
Longzheng Wang
Chuang Zhang
Hongbo Xu
Yongxiu Xu
Xiaohan Xu
Siqi Wang
263
75
0
25 Feb 2023
Language-Driven Representation Learning for Robotics
Siddharth Karamcheti
Suraj Nair
Annie S. Chen
Thomas Kollar
Chelsea Finn
Dorsa Sadigh
Abigail Z. Jacobs
LM&Ro
SSL
277
190
0
24 Feb 2023
Side Adapter Network for Open-Vocabulary Semantic Segmentation
Computer Vision and Pattern Recognition (CVPR), 2023
Mengde Xu
Zheng Zhang
Fangyun Wei
Han Hu
Xiang Bai
VLM
311
363
0
23 Feb 2023
Aligning Text-to-Image Models using Human Feedback
Kimin Lee
Hao Liu
Moonkyung Ryu
Olivia Watkins
Yuqing Du
Craig Boutilier
Pieter Abbeel
Mohammad Ghavamzadeh
S. Gu
EGVM
338
383
0
23 Feb 2023
Language Model Crossover: Variation through Few-Shot Prompting
ACM Transactions on Evolutionary Learning and Optimization (TELO), 2023
Elliot Meyerson
M. Nelson
Herbie Bradley
Adam Gaier
Arash Moradi
Amy K. Hoover
Joel Lehman
VLM
458
124
0
23 Feb 2023
Test-Time Distribution Normalization for Contrastively Learned Vision-language Models
Neural Information Processing Systems (NeurIPS), 2023
Yi Zhou
Juntao Ren
Fengyu Li
Ramin Zabih
Ser-Nam Lim
VLM
244
21
0
22 Feb 2023
Deep Active Learning in the Presence of Label Noise: A Survey
Moseli Motsóehli
Kyungim Baek
NoLa
VLM
278
5
0
22 Feb 2023
Optical Transformers
Maxwell G. Anderson
Shifan Ma
Tianyu Wang
Logan G. Wright
Peter L. McMahon
149
35
0
20 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Machine Intelligence Research (MIR), 2023
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
464
272
0
20 Feb 2023
Few-shot Multimodal Multitask Multilingual Learning
Vasu Sharma
Vinija Jain
211
0
0
19 Feb 2023
Zero-Shot Anomaly Detection via Batch Normalization
Neural Information Processing Systems (NeurIPS), 2023
Aodong Li
Chen Qiu
Matthias Kirchler
Padhraic Smyth
Maja R. Rudolph
Stephan Mandt
470
0
0
15 Feb 2023
Sparse-SignSGD with Majority Vote for Communication-Efficient Distributed Learning
International Symposium on Information Theory (ISIT), 2023
Chanho Park
Namyoon Lee
FedML
157
6
0
15 Feb 2023
Symbolic Discovery of Optimization Algorithms
Neural Information Processing Systems (NeurIPS), 2023
Xiangning Chen
Chen Liang
Da Huang
Esteban Real
Kaiyuan Wang
...
Xuanyi Dong
Thang Luong
Cho-Jui Hsieh
Yifeng Lu
Quoc V. Le
769
513
0
13 Feb 2023
Paparazzi: A Deep Dive into the Capabilities of Language and Vision Models for Grounding Viewpoint Descriptions
Findings (Findings), 2023
Henrik Voigt
J. Hombeck
M. Meuschke
K. Lawonn
Sina Zarrieß
VLM
229
2
0
13 Feb 2023
Less is More: Selective Layer Finetuning with SubTuning
Gal Kaplun
Andrey Gurevich
Tal Swisa
Mazor David
Shai Shalev-Shwartz
Eran Malach
212
11
0
13 Feb 2023
Calibrating a Deep Neural Network with Its Predecessors
International Joint Conference on Artificial Intelligence (IJCAI), 2023
Linwei Tao
Minjing Dong
Daochang Liu
Changming Sun
Chang Xu
BDL
UQCV
236
8
0
13 Feb 2023
CoMAE: Single Model Hybrid Pre-training on Small-Scale RGB-D Datasets
AAAI Conference on Artificial Intelligence (AAAI), 2023
Jiang Yang
Sheng Guo
Gangshan Wu
Limin Wang
VLM
145
9
0
13 Feb 2023
NYCU-TWO at Memotion 3: Good Foundation, Good Teacher, then you have Good Meme Analysis
Yu-Chien Tang
Kuang-Da Wang
Ting-Yun Ou
Wenjie Peng
129
2
0
13 Feb 2023
Scaling Vision Transformers to 22 Billion Parameters
International Conference on Machine Learning (ICML), 2023
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
...
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
407
766
0
10 Feb 2023
Analyzing Multimodal Objectives Through the Lens of Generative Diffusion Guidance
Chaerin Kong
Nojun Kwak
DiffM
172
2
0
10 Feb 2023
Re-ViLM: Retrieval-Augmented Visual Language Model for Zero and Few-Shot Image Captioning
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Zhuolin Yang
Ming-Yu Liu
Zihan Liu
V. Korthikanti
Weili Nie
...
Yuke Zhu
Mohammad Shoeybi
Bryan Catanzaro
Chaowei Xiao
Anima Anandkumar
VLM
RALM
194
50
0
09 Feb 2023
SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation
Yash J. Patel
Yusheng Xie
Yi Zhu
Srikar Appalaraju
R. Manmatha
199
4
0
07 Feb 2023
Pic2Word: Mapping Pictures to Words for Zero-shot Composed Image Retrieval
Computer Vision and Pattern Recognition (CVPR), 2023
Kuniaki Saito
Kihyuk Sohn
Xiang Zhang
Chun-Liang Li
Chen-Yu Lee
Kate Saenko
Tomas Pfister
308
165
0
06 Feb 2023
Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
International Conference on Machine Learning (ICML), 2023
Zekun Qi
Runpei Dong
Guo Fan
Zheng Ge
Xiangyu Zhang
Kaisheng Ma
Li Yi
396
187
0
05 Feb 2023
IC3: Image Captioning by Committee Consensus
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
296
23
0
02 Feb 2023
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
International Conference on Machine Learning (ICML), 2023
Haiyang Xu
Qinghao Ye
Mingshi Yan
Yaya Shi
Jiabo Ye
...
Guohai Xu
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
MLLM
VLM
MoE
254
218
0
01 Feb 2023
UPop: Unified and Progressive Pruning for Compressing Vision-Language Transformers
International Conference on Machine Learning (ICML), 2023
Dachuan Shi
Chaofan Tao
Ying Jin
Zhendong Yang
Chun Yuan
Yuan Liu
VLM
ViT
365
55
0
31 Jan 2023
The Power of External Memory in Increasing Predictive Model Capacity
Cenk Baykal
D. Cutler
Nishanth Dikkala
Nikhil Ghosh
Rina Panigrahy
Xin Wang
KELM
156
0
0
31 Jan 2023
Alternating Updates for Efficient Transformers
Neural Information Processing Systems (NeurIPS), 2023
Cenk Baykal
D. Cutler
Nishanth Dikkala
Nikhil Ghosh
Rina Panigrahy
Xin Wang
MoE
177
8
0
30 Jan 2023
Advancing Radiograph Representation Learning with Masked Record Modeling
International Conference on Learning Representations (ICLR), 2023
Hong-Yu Zhou
Chenyu Lian
Lian-cheng Wang
Yizhou Yu
MedIm
284
85
0
30 Jan 2023
Massively Scaling Heteroscedastic Classifiers
International Conference on Learning Representations (ICLR), 2023
Mark Collier
Rodolphe Jenatton
Basil Mustafa
N. Houlsby
Jesse Berent
E. Kokiopoulou
218
11
0
30 Jan 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
International Conference on Machine Learning (ICML), 2023
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
1.3K
6,618
0
30 Jan 2023
ACL-Fig: A Dataset for Scientific Figure Classification
Zeba Karishma
Shaurya Rohatgi
Kavya S. Puranik
Jian Wu
C. Lee Giles
93
11
0
28 Jan 2023
Neural Additive Models for Location Scale and Shape: A Framework for Interpretable Neural Regression Beyond the Mean
International Conference on Artificial Intelligence and Statistics (AISTATS), 2023
Anton Thielmann
René-Marcel Kruse
Thomas Kneib
Benjamin Säfken
252
23
0
27 Jan 2023
Discovering and Mitigating Visual Biases through Keyword Explanation
Computer Vision and Pattern Recognition (CVPR), 2023
Younghyun Kim
Sangwoo Mo
Minkyu Kim
Kyungmin Lee
Jaeho Lee
Jinwoo Shin
507
50
0
26 Jan 2023
Affective Faces for Goal-Driven Dyadic Communication
Scott Geng
Revant Teotia
Purva Tendulkar
Sachit Menon
Carl Vondrick
VGen
127
31
0
26 Jan 2023
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Computer Vision and Pattern Recognition (CVPR), 2023
Floris Weers
Vaishaal Shankar
Angelos Katharopoulos
Yinfei Yang
Tom Gunter
CLIP
347
6
0
19 Jan 2023
Towards Models that Can See and Read
IEEE International Conference on Computer Vision (ICCV), 2023
Roy Ganz
Oren Nuriel
Aviad Aberdam
Yair Kittenplon
Shai Mazor
Ron Litman
285
16
0
18 Jan 2023
Learning Customized Visual Models with Retrieval-Augmented Knowledge
Computer Vision and Pattern Recognition (CVPR), 2023
Haotian Liu
Kilho Son
Jianwei Yang
Ce Liu
Jianfeng Gao
Yong Jae Lee
Chunyuan Li
VLM
231
77
0
17 Jan 2023
Vision Learners Meet Web Image-Text Pairs
Bingchen Zhao
Quan Cui
Hao Wu
Osamu Yoshie
Cheng Yang
Oisin Mac Aodha
VLM
181
6
0
17 Jan 2023
RILS: Masked Visual Reconstruction in Language Semantic Space
Computer Vision and Pattern Recognition (CVPR), 2023
Shusheng Yang
Yixiao Ge
Kun Yi
Dian Li
Ying Shan
Xiaohu Qie
Xinggang Wang
CLIP
188
14
0
17 Jan 2023
UATVR: Uncertainty-Adaptive Text-Video Retrieval
IEEE International Conference on Computer Vision (ICCV), 2023
Bo Fang
Wenhao Wu
Chang-rui Liu
Can Ma
Yuxin Song
Weiping Wang
Min Yang
Xiang Ji
Jingdong Wang
246
82
0
16 Jan 2023
Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models
Computer Vision and Pattern Recognition (CVPR), 2023
Zhiqiu Lin
Samuel Yu
Zhiyi Kuang
Deepak Pathak
Deva Ramana
VLM
450
152
0
16 Jan 2023
Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study
European Conference on Information Retrieval (ECIR), 2023
Mariya Hendriksen
Svitlana Vakulenko
E. Kuiper
Maarten de Rijke
300
5
0
12 Jan 2023
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Xinsong Zhang
Yan Zeng
Jipeng Zhang
Hang Li
VLM
AI4CE
LRM
302
18
0
12 Jan 2023
Does progress on ImageNet transfer to real-world datasets?
Neural Information Processing Systems (NeurIPS), 2023
Alex Fang
Simon Kornblith
Ludwig Schmidt
VLM
193
47
0
11 Jan 2023
Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing
Computer Vision and Pattern Recognition (CVPR), 2023
Shruthi Bannur
Stephanie L. Hyland
Qianchu Liu
Fernando Pérez-García
Maximilian Ilse
...
Maria T. A. Wetscherek
M. Lungren
A. Nori
Javier Alvarez-Valle
Ozan Oktay
313
203
0
11 Jan 2023
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
Computer Vision and Pattern Recognition (CVPR), 2023
Filip Radenovic
Abhimanyu Dubey
Abhishek Kadian
Todor Mihaylov
Simon Vandenhende
Yash J. Patel
Y. Wen
Vignesh Ramanathan
D. Mahajan
VLM
338
101
0
05 Jan 2023
CiT: Curation in Training for Effective Vision-Language Data
IEEE International Conference on Computer Vision (ICCV), 2023
Hu Xu
Saining Xie
Po-Yao (Bernie) Huang
Licheng Yu
Russ Howes
Gargi Ghosh
Luke Zettlemoyer
Christoph Feichtenhofer
VLM
DiffM
127
30
0
05 Jan 2023
Previous
1
2
3
...
17
18
19
20
21
Next