ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.01917
  4. Cited By
CoCa: Contrastive Captioners are Image-Text Foundation Models
v1v2 (latest)

CoCa: Contrastive Captioners are Image-Text Foundation Models

4 May 2022
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
    VLMCLIPOffRL
ArXiv (abs)PDFHTMLHuggingFace (3 upvotes)

Papers citing "CoCa: Contrastive Captioners are Image-Text Foundation Models"

50 / 1,041 papers shown
Title
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale
EVA: Exploring the Limits of Masked Visual Representation Learning at ScaleComputer Vision and Pattern Recognition (CVPR), 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLMCLIP
555
888
0
14 Nov 2022
ContextCLIP: Contextual Alignment of Image-Text pairs on CLIP visual
  representations
ContextCLIP: Contextual Alignment of Image-Text pairs on CLIP visual representationsIndian Conference on Computer Vision, Graphics & Image Processing (ICVGIP), 2022
Chanda Grover
Indra Deep Mastan
Debayan Gupta
VLMCLIP
172
5
0
14 Nov 2022
Zero-shot Visual Commonsense Immorality Prediction
Zero-shot Visual Commonsense Immorality PredictionBritish Machine Vision Conference (BMVC), 2022
Yujin Jeong
Seongbeom Park
Suhong Moon
Jinkyu Kim
VLM
79
3
0
10 Nov 2022
Okapi: Generalising Better by Making Statistical Matches Match
Okapi: Generalising Better by Making Statistical Matches MatchNeural Information Processing Systems (NeurIPS), 2022
Myles Bartlett
Sara Romiti
V. Sharmanska
Novi Quadrianto
153
3
0
07 Nov 2022
Boosting Binary Neural Networks via Dynamic Thresholds Learning
Boosting Binary Neural Networks via Dynamic Thresholds Learning
Jiehua Zhang
Xueyang Zhang
Z. Su
Zitong Yu
Yanghe Feng
Xin Lu
M. Pietikäinen
Li Liu
MQ
237
0
0
04 Nov 2022
A simple, efficient and scalable contrastive masked autoencoder for
  learning visual representations
A simple, efficient and scalable contrastive masked autoencoder for learning visual representations
Shlok Kumar Mishra
Joshua Robinson
Huiwen Chang
David Jacobs
Aaron Sarna
Aaron Maschinot
Dilip Krishnan
DiffM
218
37
0
30 Oct 2022
FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified
  Retrieval and Captioning
FaD-VLP: Fashion Vision-and-Language Pre-training towards Unified Retrieval and CaptioningConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Suvir Mirchandani
Licheng Yu
Mengjiao MJ Wang
Animesh Sinha
Wen-Jun Jiang
Tao Xiang
Ning Zhang
230
17
0
26 Oct 2022
A Case for Business Process-Specific Foundation Models
A Case for Business Process-Specific Foundation Models
Sadhana Kumaravel
Praveen Venkateswaran
Vatche Isahagian
Vinod Muthusamy
AI4CE
170
12
0
26 Oct 2022
The Curious Case of Benign Memorization
The Curious Case of Benign MemorizationInternational Conference on Learning Representations (ICLR), 2022
Sotiris Anagnostidis
Gregor Bachmann
Lorenzo Noci
Thomas Hofmann
AAML
323
12
0
25 Oct 2022
Global Contrastive Batch Sampling via Optimization on Sample
  Permutations
Global Contrastive Batch Sampling via Optimization on Sample PermutationsInternational Conference on Machine Learning (ICML), 2022
Vin Sachidananda
Ziyi Yang
Chenguang Zhu
301
6
0
23 Oct 2022
CPL: Counterfactual Prompt Learning for Vision and Language Models
CPL: Counterfactual Prompt Learning for Vision and Language ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Xuehai He
Diji Yang
Weixi Feng
Tsu-Jui Fu
Arjun Reddy Akula
Varun Jampani
P. Narayana
Sugato Basu
William Yang Wang
Xinze Wang
VPVLMVLM
288
19
0
19 Oct 2022
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text
MedCLIP: Contrastive Learning from Unpaired Medical Images and TextConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Zifeng Wang
Zhenbang Wu
Dinesh Agarwal
Jimeng Sun
CLIPVLMMedIm
361
667
0
18 Oct 2022
Perceptual Grouping in Contrastive Vision-Language Models
Perceptual Grouping in Contrastive Vision-Language ModelsIEEE International Conference on Computer Vision (ICCV), 2022
Kanchana Ranasinghe
Brandon McKinzie
S. S. Ravi
Yinfei Yang
Alexander Toshev
Jonathon Shlens
VLM
392
71
0
18 Oct 2022
Non-Contrastive Learning Meets Language-Image Pre-Training
Non-Contrastive Learning Meets Language-Image Pre-TrainingComputer Vision and Pattern Recognition (CVPR), 2022
Jinghao Zhou
Li Dong
Zhe Gan
Lijuan Wang
Furu Wei
VLMCLIP
182
33
0
17 Oct 2022
LAION-5B: An open large-scale dataset for training next generation
  image-text models
LAION-5B: An open large-scale dataset for training next generation image-text modelsNeural Information Processing Systems (NeurIPS), 2022
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLMMLLMCLIP
764
4,476
0
16 Oct 2022
CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision Models
CAP: Correlation-Aware Pruning for Highly-Accurate Sparse Vision ModelsNeural Information Processing Systems (NeurIPS), 2022
Denis Kuznedelev
Eldar Kurtic
Elias Frantar
Dan Alistarh
VLMViT
172
21
0
14 Oct 2022
Caption supervision enables robust learners
Caption supervision enables robust learners
Ben Feuer
Ameya Joshi
Chinmay Hegde
SSLCLIPVLM
182
3
0
13 Oct 2022
Leveraging Off-the-shelf Diffusion Model for Multi-attribute Fashion
  Image Manipulation
Leveraging Off-the-shelf Diffusion Model for Multi-attribute Fashion Image ManipulationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
Chaerin Kong
D. Jeon
Oh-Hun Kwon
Nojun Kwak
DiffM
152
19
0
12 Oct 2022
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained
  Models
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained ModelsBritish Machine Vision Conference (BMVC), 2022
Omiros Pantazis
Gabriel J. Brostow
Kate E. Jones
Oisin Mac Aodha
VLM
212
49
0
07 Oct 2022
SynBench: Task-Agnostic Benchmarking of Pretrained Representations using
  Synthetic Data
SynBench: Task-Agnostic Benchmarking of Pretrained Representations using Synthetic Data
Ching-Yun Ko
Pin-Yu Chen
Jeet Mohapatra
Payel Das
Lucani E. Daniel
289
3
0
06 Oct 2022
MuRAG: Multimodal Retrieval-Augmented Generator for Open Question
  Answering over Images and Text
MuRAG: Multimodal Retrieval-Augmented Generator for Open Question Answering over Images and TextConference on Empirical Methods in Natural Language Processing (EMNLP), 2022
Wenhu Chen
Hexiang Hu
Xi Chen
Pat Verga
William W. Cohen
RALM
284
230
0
06 Oct 2022
A Closer Look at Robustness to L-infinity and Spatial Perturbations and
  their Composition
A Closer Look at Robustness to L-infinity and Spatial Perturbations and their Composition
Luke Rowe
Benjamin Thérien
Krzysztof Czarnecki
Hongyang R. Zhang
OOD
135
0
0
05 Oct 2022
Progressive Text-to-Image Generation
Progressive Text-to-Image Generation
Zhengcong Fei
Mingyuan Fan
Li Zhu
Junshi Huang
268
4
0
05 Oct 2022
ASIF: Coupled Data Turns Unimodal Models to Multimodal Without Training
ASIF: Coupled Data Turns Unimodal Models to Multimodal Without TrainingNeural Information Processing Systems (NeurIPS), 2022
Antonio Norelli
Marco Fumero
Valentino Maiorca
Luca Moschella
Emanuele Rodolà
Francesco Locatello
VLM
332
44
0
04 Oct 2022
LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of
  Vision & Language Models
LASP: Text-to-Text Optimization for Language-Aware Soft Prompting of Vision & Language ModelsComputer Vision and Pattern Recognition (CVPR), 2022
Adrian Bulat
Georgios Tzimiropoulos
VLMVPVLM
215
70
0
03 Oct 2022
Towards a Unified View on Visual Parameter-Efficient Transfer Learning
Towards a Unified View on Visual Parameter-Efficient Transfer Learning
Bruce X. B. Yu
Jianlong Chang
Lin Liu
Qi Tian
Changan Chen
VPVLMVLM
210
44
0
03 Oct 2022
Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple
  Tasks
Zemi: Learning Zero-Shot Semi-Parametric Language Models from Multiple TasksAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Zhenhailong Wang
Xiaoman Pan
Dian Yu
Dong Yu
Jianshu Chen
Heng Ji
VLM
215
10
0
01 Oct 2022
Medical Image Understanding with Pretrained Vision Language Models: A
  Comprehensive Study
Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive StudyInternational Conference on Learning Representations (ICLR), 2022
Ziyuan Qin
Huahui Yi
Qicheng Lao
Kang Li
VLM
287
85
0
30 Sep 2022
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text
  Pre-training
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training
Bin Shan
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
VLM
139
21
0
30 Sep 2022
Physical Adversarial Attack meets Computer Vision: A Decade Survey
Physical Adversarial Attack meets Computer Vision: A Decade SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2022
Hui Wei
Hao Tang
Xuemei Jia
Zhixiang Wang
Han-Bing Yu
Zhubo Li
Shiníchi Satoh
Luc Van Gool
Zheng Wang
AAML
467
107
0
30 Sep 2022
REST: REtrieve & Self-Train for generative action recognition
REST: REtrieve & Self-Train for generative action recognition
Adrian Bulat
Enrique Sanchez
Brais Martínez
Georgios Tzimiropoulos
VLM
234
4
0
29 Sep 2022
Spotlight: Mobile UI Understanding using Vision-Language Models with a
  Focus
Spotlight: Mobile UI Understanding using Vision-Language Models with a FocusInternational Conference on Learning Representations (ICLR), 2022
Gang Li
Yang Li
274
81
0
29 Sep 2022
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
OmniVL:One Foundation Model for Image-Language and Video-Language TasksNeural Information Processing Systems (NeurIPS), 2022
Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Luowei Zhou
Yucheng Zhao
Yujia Xie
Ce Liu
Yu-Gang Jiang
Lu Yuan
MLLMVLM
255
178
0
15 Sep 2022
Neural Networks Reduction via Lumping
Neural Networks Reduction via LumpingInternational Conference of the Italian Association for Artificial Intelligence (AIxIA), 2022
Dalila Ressi
Riccardo Romanello
S. Rossi
Carla Piazza
191
5
0
15 Sep 2022
Correlation Information Bottleneck: Towards Adapting Pretrained
  Multimodal Models for Robust Visual Question Answering
Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question AnsweringInternational Journal of Computer Vision (IJCV), 2022
Jingjing Jiang
Zi-yi Liu
Nanning Zheng
320
12
0
14 Sep 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI: A Jointly-Scaled Multilingual Language-Image ModelInternational Conference on Learning Representations (ICLR), 2022
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLMVLM
678
897
0
14 Sep 2022
Vision Transformers for Action Recognition: A Survey
Vision Transformers for Action Recognition: A Survey
Anwaar Ulhaq
Naveed Akhtar
Ganna Pogrebna
Lin Wang
ViT
197
65
0
13 Sep 2022
FETA: Towards Specializing Foundation Models for Expert Task
  Applications
FETA: Towards Specializing Foundation Models for Expert Task Applications
Amit Alfassy
Assaf Arbelle
Oshri Halimi
Sivan Harary
Roei Herzig
...
Christoph Auer
Kate Saenko
Peter W. J. Staar
Rogerio Feris
Leonid Karlinsky
243
20
0
08 Sep 2022
What does a platypus look like? Generating customized prompts for
  zero-shot image classification
What does a platypus look like? Generating customized prompts for zero-shot image classificationIEEE International Conference on Computer Vision (ICCV), 2022
Sarah M Pratt
Ian Covert
Rosanne Liu
Ali Farhadi
VLM
432
306
0
07 Sep 2022
Statistical Foundation Behind Machine Learning and Its Impact on
  Computer Vision
Statistical Foundation Behind Machine Learning and Its Impact on Computer Vision
Lei Zhang
H. Shum
VLMSSL
118
2
0
06 Sep 2022
Language-aware Domain Generalization Network for Cross-Scene
  Hyperspectral Image Classification
Language-aware Domain Generalization Network for Cross-Scene Hyperspectral Image ClassificationIEEE Transactions on Geoscience and Remote Sensing (IEEE TGRS), 2022
Yuxiang Zhang
Mengmeng Zhang
Wei Li
Shuai Wang
Ran Tao
VLM
287
134
0
06 Sep 2022
Design of the topology for contrastive visual-textual alignment
Design of the topology for contrastive visual-textual alignment
Zhun Sun
350
2
0
05 Sep 2022
Generalization in Neural Networks: A Broad Survey
Generalization in Neural Networks: A Broad SurveyNeurocomputing (Neurocomputing), 2022
Chris Rohlfs
OODAI4CE
229
16
0
04 Sep 2022
Diffusion Models: A Comprehensive Survey of Methods and Applications
Diffusion Models: A Comprehensive Survey of Methods and ApplicationsACM Computing Surveys (ACM CSUR), 2022
Ling Yang
Zhilong Zhang
Yingxia Shao
Shenda Hong
Runsheng Xu
Yue Zhao
Wentao Zhang
Tengjiao Wang
Ming-Hsuan Yang
DiffMMedIm
1.3K
1,846
0
02 Sep 2022
Topic Detection in Continuous Sign Language Videos
Topic Detection in Continuous Sign Language Videos
Álvaro Budria
Laia Tarrés
Gerard I. Gállego
Francesc Moreno-Noguer
Jordi Torres
Xavier Giró-i-Nieto
SLRVLM
183
2
0
01 Sep 2022
Efficient Vision-Language Pretraining with Visual Concepts and
  Hierarchical Alignment
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical AlignmentBritish Machine Vision Conference (BMVC), 2022
Mustafa Shukor
Guillaume Couairon
Matthieu Cord
VLMCLIP
265
27
0
29 Aug 2022
Overparameterization from Computational Constraints
Overparameterization from Computational ConstraintsNeural Information Processing Systems (NeurIPS), 2022
Sanjam Garg
S. Jha
Saeed Mahloujifar
Mohammad Mahmoody
Mingyuan Wang
144
3
0
27 Aug 2022
Image as a Foreign Language: BEiT Pretraining for All Vision and
  Vision-Language Tasks
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Wenhui Wang
Hangbo Bao
Li Dong
Johan Bjorck
Zhiliang Peng
...
Kriti Aggarwal
O. Mohammed
Saksham Singhal
Subhojit Som
Furu Wei
MLLMVLMViT
530
704
0
22 Aug 2022
Improved Image Classification with Token Fusion
Improved Image Classification with Token FusionIEEE Access (IEEE Access), 2022
Keong-Hun Choi
Jin-Woo Kim
Yaolong Wang
J. Ha
ViT
166
0
0
19 Aug 2022
MILAN: Masked Image Pretraining on Language Assisted Representation
MILAN: Masked Image Pretraining on Language Assisted Representation
Zejiang Hou
Fei Sun
Yen-kuang Chen
Yuan Xie
S. Kung
ViT
231
83
0
11 Aug 2022
Previous
123...192021
Next