Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2205.01917
Cited By
v1
v2 (latest)
CoCa: Contrastive Captioners are Image-Text Foundation Models
4 May 2022
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Papers citing
"CoCa: Contrastive Captioners are Image-Text Foundation Models"
50 / 1,041 papers shown
Title
Orthogonal Temporal Interpolation for Zero-Shot Video Recognition
ACM Multimedia (ACM MM), 2023
Yan Zhu
Junbao Zhuo
B. Ma
Jiajia Geng
Xiaoming Wei
Xiaolin K. Wei
Shuhui Wang
VLM
120
6
0
14 Aug 2023
Temporally-Adaptive Models for Efficient Video Understanding
Ziyuan Huang
Shiwei Zhang
Liang Pan
Zhiwu Qing
Yingya Zhang
Ziwei Liu
Marcelo H. Ang
185
16
0
10 Aug 2023
ViLP: Knowledge Exploration using Vision, Language, and Pose Embeddings for Video Action Recognition
Indian Conference on Computer Vision, Graphics & Image Processing (ICVGIP), 2023
S. Chaudhuri
Saumik Bhattacharya
168
4
0
07 Aug 2023
Learning Concise and Descriptive Attributes for Visual Recognition
IEEE International Conference on Computer Vision (ICCV), 2023
Andy Yan
Yu Wang
Yiwu Zhong
Chengyu Dong
Zexue He
Yujie Lu
William Wang
Jingbo Shang
Julian McAuley
VLM
230
84
0
07 Aug 2023
Distributionally Robust Classification on a Data Budget
Ben Feuer
Ameya Joshi
Minh Pham
Chinmay Hegde
OOD
225
2
0
07 Aug 2023
DiT: Efficient Vision Transformers with Dynamic Token Routing
Yuchen Ma
Zhengcong Fei
Junshi Huang
ViT
201
2
0
07 Aug 2023
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Neural Information Processing Systems (NeurIPS), 2023
Qihang Yu
Ju He
XueQing Deng
Xiaohui Shen
Liang-Chieh Chen
VLM
CLIP
281
198
0
04 Aug 2023
The All-Seeing Project: Towards Panoptic Visual Recognition and Understanding of the Open World
International Conference on Learning Representations (ICLR), 2023
Weiyun Wang
Min Shi
Qingyun Li
Wen Wang
Zhenhang Huang
...
Zhiguo Cao
Yushi Chen
Tong Lu
Jifeng Dai
Yu Qiao
LRM
MLLM
247
117
0
03 Aug 2023
Guiding Image Captioning Models Toward More Specific Captions
IEEE International Conference on Computer Vision (ICCV), 2023
Simon Kornblith
Lala Li
Zirui Wang
Thao Nguyen
309
19
0
31 Jul 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
Mustafa Shukor
Corentin Dancette
Alexandre Ramé
Matthieu Cord
MoMe
MLLM
283
54
0
30 Jul 2023
Cross-Modal Concept Learning and Inference for Vision-Language Models
Neurocomputing (Neurocomputing), 2023
Yi Zhang
Ce Zhang
Yushun Tang
Z. He
VLM
MLLM
CLIP
184
20
0
28 Jul 2023
CLIP Brings Better Features to Visual Aesthetics Learners
Liwu Xu
Jinjin Xu
Yuzhe Yang
Yi-Jie Huang
Yanchun Xie
Yaqian Li
VLM
197
5
0
28 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
416
151
0
25 Jul 2023
Towards a Visual-Language Foundation Model for Computational Pathology
Ming Y. Lu
Bowen Chen
Drew F. K. Williamson
Richard J. Chen
Ivy Liang
...
Andrew Zhang
L. Le
Georg Gerber
Anil V. Parwani
Faisal Mahmood
VLM
MedIm
294
57
0
24 Jul 2023
CLIP-KD: An Empirical Study of CLIP Model Distillation
Computer Vision and Pattern Recognition (CVPR), 2023
Chuanguang Yang
Zhulin An
Libo Huang
Junyu Bi
Xinqiang Yu
Hansheng Yang
Boyu Diao
Yongjun Xu
VLM
333
73
0
24 Jul 2023
PRIOR: Prototype Representation Joint Learning from Medical Images and Reports
IEEE International Conference on Computer Vision (ICCV), 2023
Pujin Cheng
Li Lin
Junyan Lyu
Yijin Huang
Tong Lu
Xiaoying Tang
MedIm
380
78
0
24 Jul 2023
Towards Video Anomaly Retrieval from Video Anomaly Detection: New Benchmarks and Model
IEEE Transactions on Image Processing (IEEE TIP), 2023
Peng Wu
Jing Liu
Xiangteng He
Yuxin Peng
Peng Wang
Yanning Zhang
378
46
0
24 Jul 2023
Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?
IEEE International Conference on Computer Vision (ICCV), 2023
Cheng-En Wu
Yu Tian
Haichao Yu
Heng Wang
Pedro Morgado
Yu Hen Hu
Linjie Yang
NoLa
VPVLM
VLM
127
25
0
22 Jul 2023
Robust Visual Question Answering: Datasets, Methods, and Future Challenges
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jie Ma
Pinghui Wang
Dechen Kong
Zewei Wang
Jun Liu
Hongbin Pei
Junzhou Zhao
OOD
299
44
0
21 Jul 2023
GIST: Generating Image-Specific Text for Fine-grained Object Classification
Kathleen M. Lewis
Emily Mu
Adrian Dalca
John Guttag
VLM
148
10
0
21 Jul 2023
Meta-Transformer: A Unified Framework for Multimodal Learning
Yiyuan Zhang
Kaixiong Gong
Kaipeng Zhang
Jiaming Song
Yu Qiao
Wanli Ouyang
Xiangyu Yue
180
181
0
20 Jul 2023
A Holistic Assessment of the Reliability of Machine Learning Systems
Anthony Corso
David Karamadian
Romeo Valentin
Mary Cooper
Mykel J. Kochenderfer
308
10
0
20 Jul 2023
Classification of Visualization Types and Perspectives in Patents
International Conference on Theory and Practice of Digital Libraries (TPDL), 2023
J. Ghauri
Eric Müller-Budack
Ralph Ewerth
206
4
0
19 Jul 2023
Improving Multimodal Datasets with Image Captioning
Neural Information Processing Systems (NeurIPS), 2023
Thao Nguyen
S. Gadre
Gabriel Ilharco
Sewoong Oh
Ludwig Schmidt
VLM
243
123
0
19 Jul 2023
Divert More Attention to Vision-Language Object Tracking
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Mingzhe Guo
Zhipeng Zhang
Li Jing
Haibin Ling
Heng Fan
VLM
253
13
0
19 Jul 2023
Distilling Knowledge from Text-to-Image Generative Models Improves Visio-Linguistic Reasoning in CLIP
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
S. Basu
S. Hu
Maziar Sanjabi
Daniela Massiceti
Soheil Feizi
VLM
201
6
0
18 Jul 2023
Multimodal LLMs for health grounded in individual-specific data
Anastasiya Belyaeva
J. Cosentino
F. Hormozdiari
Krish Eswaran
S. Shetty
Greg C. Corrado
Andrew Carroll
Cory Y. McLean
N. Furlotte
LM&MA
225
78
0
18 Jul 2023
Fine-grained Text-Video Retrieval with Frozen Image Encoders
Zuozhuo Dai
Fang Shao
Qingkun Su
Zilong Dong
Siyu Zhu
396
1
0
14 Jul 2023
Bootstrapping Vision-Language Learning with Decoupled Language Pre-training
Neural Information Processing Systems (NeurIPS), 2023
Yiren Jian
Chongyang Gao
Soroush Vosoughi
VLM
MLLM
368
43
0
13 Jul 2023
mBLIP: Efficient Bootstrapping of Multilingual Vision-LLMs
Gregor Geigle
Abhay Jain
Radu Timofte
Goran Glavaš
VLM
MLLM
219
41
0
13 Jul 2023
Can Vision-Language Models be a Good Guesser? Exploring VLMs for Times and Location Reasoning
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Gengyuan Zhang
Yurui Zhang
Kerui Zhang
Volker Tresp
LRM
291
23
0
12 Jul 2023
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
IEEE International Conference on Computer Vision (ICCV), 2023
Shraman Pramanick
Yale Song
Sayan Nag
Kevin Qinghong Lin
Hardik Shah
Mike Zheng Shou
Ramalingam Chellappa
Pengchuan Zhang
VLM
323
131
0
11 Jul 2023
Emu: Generative Pretraining in Multimodality
International Conference on Learning Representations (ICLR), 2023
Quan-Sen Sun
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Yueze Wang
Hongcheng Gao
Jingjing Liu
Tiejun Huang
Xinlong Wang
MLLM
337
155
0
11 Jul 2023
Divide, Evaluate, and Refine: Evaluating and Improving Text-to-Image Alignment with Iterative VQA Feedback
Neural Information Processing Systems (NeurIPS), 2023
Jaskirat Singh
Liang Zheng
283
36
0
10 Jul 2023
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
166
7
0
06 Jul 2023
VideoGLUE: Video General Understanding Evaluation of Foundation Models
Liangzhe Yuan
N. B. Gundavarapu
Long Zhao
Hao Zhou
Huayu Chen
...
Florian Schroff
Hartwig Adam
Ming-Hsuan Yang
Ting Liu
Boqing Gong
ELM
236
14
0
06 Jul 2023
Review of Large Vision Models and Visual Prompt Engineering
Yuan Liu
Zheng Liu
Lin Zhao
Zihao Wu
Chong Ma
...
Bao Ge
Yixuan Yuan
Hongtu Zhu
Tianming Liu
Shu Zhang
VLM
LRM
297
207
0
03 Jul 2023
ProbVLM: Probabilistic Adapter for Frozen Vision-Language Models
Uddeshya Upadhyay
Shyamgopal Karthik
Goran Frehse
Zeynep Akata
MLLM
VLM
434
5
0
01 Jul 2023
Stitched ViTs are Flexible Vision Backbones
European Conference on Computer Vision (ECCV), 2023
Zizheng Pan
Jing Liu
Haoyu He
Jianfei Cai
Bohan Zhuang
167
4
0
30 Jun 2023
Goal Representations for Instruction Following: A Semi-Supervised Language Interface to Control
Conference on Robot Learning (CoRL), 2023
Vivek Myers
Andre Wang He
Kuan Fang
Homer Walke
Philippe Hansen-Estruch
Ching-An Cheng
Mihai Jalobeanu
Andrey Kolobov
Anca Dragan
Sergey Levine
LM&Ro
392
38
0
30 Jun 2023
Prompt Ensemble Self-training for Open-Vocabulary Domain Adaptation
Jiaxing Huang
Jingyi Zhang
Han Qiu
Sheng Jin
Shijian Lu
VPVLM
VLM
336
3
0
29 Jun 2023
EgoCOL: Egocentric Camera pose estimation for Open-world 3D object Localization @Ego4D challenge 2023
Cristhian Forigua
María Escobar
Jordi Pont-Tuset
Kevis-Kokitsi Maninis
Pablo Arbelaez
EgoV
234
2
0
29 Jun 2023
Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity
Zhenlin Xu
Yi Zhu
Tiffany Deng
Abhay Mittal
Yanbei Chen
Manchen Wang
Paolo Favaro
Joseph Tighe
Davide Modolo
VLM
CoGe
282
14
0
28 Jun 2023
Towards Open Vocabulary Learning: A Survey
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023
Jianzong Wu
Xiangtai Li
Shilin Xu
Haobo Yuan
Henghui Ding
...
Jiangning Zhang
Yu Tong
Xudong Jiang
Guohao Li
Dacheng Tao
ObjD
VLM
406
214
0
28 Jun 2023
HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution
Neural Information Processing Systems (NeurIPS), 2023
Eric N. D. Nguyen
Michael Poli
Marjan Faizi
A. Thomas
Callum Birch-Sykes
...
Stefano Massaroli
Yoshua Bengio
Stefano Ermon
S. Baccus
Christopher Ré
MedIm
295
400
0
27 Jun 2023
CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \
10,000 Budget; An Extra \
4,000 Unlocks 81.8% Accuracy
Xianhang Li
Zeyu Wang
Cihang Xie
CLIP
VLM
263
24
0
27 Jun 2023
Semi-supervised Multimodal Representation Learning through a Global Workspace
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023
Benjamin Devillers
Léopold Maytié
R. V. Rullen
SSL
150
10
0
27 Jun 2023
Switch-BERT: Learning to Model Multimodal Interactions by Switching Attention and Input
European Conference on Computer Vision (ECCV), 2023
Qingpei Guo
Kaisheng Yao
Wei Chu
MLLM
79
6
0
25 Jun 2023
Exploring Data Redundancy in Real-world Image Classification through Data Selection
Zhenyu Tang
Shaoting Zhang
Xiaosong Wang
149
3
0
25 Jun 2023
OpenMask3D: Open-Vocabulary 3D Instance Segmentation
Neural Information Processing Systems (NeurIPS), 2023
Ayca Takmaz
Elisabetta Fedele
R. Sumner
Marc Pollefeys
F. Tombari
Francis Engelmann
ISeg
VLM
232
244
0
23 Jun 2023
Previous
1
2
3
...
13
14
15
...
19
20
21
Next