Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2205.01917
Cited By
v1
v2 (latest)
CoCa: Contrastive Captioners are Image-Text Foundation Models
4 May 2022
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Papers citing
"CoCa: Contrastive Captioners are Image-Text Foundation Models"
50 / 1,041 papers shown
Title
LingoQA: Video Question Answering for Autonomous Driving
Ana-Maria Marcu
Long Chen
Jan Hünermann
Alice Karnsund
Benoît Hanotte
...
Vijay Badrinarayanan
Alex Kendall
Jamie Shotton
Elahe Arani
Oleg Sinavski
120
89
0
21 Dec 2023
Multimodal Federated Learning with Missing Modality via Prototype Mask and Contrast
Guangyin Bao
Tao Gui
Duoqian Miao
Zixuan Gong
Liang Hu
Ke Liu
Yang Liu
Chongyang Shi
152
16
0
21 Dec 2023
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models
Bingbing Wen
Zhengyuan Yang
Jianfeng Wang
Zhe Gan
Bill Howe
Lijuan Wang
MLLM
164
3
0
21 Dec 2023
Learning Object State Changes in Videos: An Open-World Perspective
Zihui Xue
Kumar Ashutosh
Kristen Grauman
VGen
292
33
0
19 Dec 2023
Tuning LayerNorm in Attention: Towards Efficient Multi-Modal LLM Finetuning
Bingchen Zhao
Haoqin Tu
Chen Wei
Jieru Mei
Cihang Xie
263
51
0
18 Dec 2023
Data-Efficient Multimodal Fusion on a Single GPU
Computer Vision and Pattern Recognition (CVPR), 2023
Noël Vouitsis
Zhaoyan Liu
S. Gorti
Valentin Villecroze
Jesse C. Cresswell
Guangwei Yu
Gabriel Loaiza-Ganem
Anthony L. Caterini
409
8
0
15 Dec 2023
General Object Foundation Model for Images and Videos at Scale
Computer Vision and Pattern Recognition (CVPR), 2023
Junfeng Wu
Yi Jiang
Qihao Liu
Zehuan Yuan
Xiang Bai
Song Bai
VOS
VLM
300
74
0
14 Dec 2023
Improving Cross-modal Alignment with Synthetic Pairs for Text-only Image Captioning
AAAI Conference on Artificial Intelligence (AAAI), 2023
Zhiyue Liu
Jinyuan Liu
Fanrong Ma
CLIP
VLM
212
20
0
14 Dec 2023
TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training
AAAI Conference on Artificial Intelligence (AAAI), 2023
Chaoya Jiang
Wei Ye
Haiyang Xu
Qinghao Ye
Mingshi Yan
Ji Zhang
Shikun Zhang
CLIP
VLM
262
6
0
14 Dec 2023
ViLA: Efficient Video-Language Alignment for Video Question Answering
European Conference on Computer Vision (ECCV), 2023
Xijun Wang
Junbang Liang
Chun-Kai Wang
Kenan Deng
Yu Lou
Ming-Chyuan Lin
Shan Yang
289
21
0
13 Dec 2023
A Foundational Multimodal Vision Language AI Assistant for Human Pathology
Ming Y. Lu
Bowen Chen
Drew F. K. Williamson
Richard J. Chen
Kenji Ikamura
...
Ivy Liang
L. Le
Tong Ding
Anil V. Parwani
Faisal Mahmood
MedIm
LM&MA
178
28
0
13 Dec 2023
CLIP as RNN: Segment Countless Visual Concepts without Training Endeavor
Computer Vision and Pattern Recognition (CVPR), 2023
Shuyang Sun
Runjia Li
Juil Sock
Xiuye Gu
Siyang Li
VLM
CLIP
419
54
0
12 Dec 2023
Domain Prompt Learning with Quaternion Networks
Computer Vision and Pattern Recognition (CVPR), 2023
Qinglong Cao
Zhengqin Xu
Yuntian Chen
Chao Ma
Xiaokang Yang
VLM
228
20
0
12 Dec 2023
Honeybee: Locality-enhanced Projector for Multimodal LLM
Junbum Cha
Wooyoung Kang
Jonghwan Mun
Byungseok Roh
MLLM
333
194
0
11 Dec 2023
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
254
106
0
11 Dec 2023
RCA-NOC: Relative Contrastive Alignment for Novel Object Captioning
Jiashuo Fan
Yaoyuan Liang
Leyao Liu
Shao-Lun Huang
Lei Zhang
255
6
0
11 Dec 2023
Causal-CoG: A Causal-Effect Look at Context Generation for Boosting Multi-modal Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Shitian Zhao
Zhuowan Li
Yadong Lu
Yaoyao Liu
Yan Wang
LRM
167
14
0
09 Dec 2023
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
European Conference on Computer Vision (ECCV), 2023
Talfan Evans
Shreya Pathak
Hamza Merzic
Jonathan Schwarz
Ryutaro Tanno
Olivier J. Hénaff
252
25
0
08 Dec 2023
AVA: Towards Autonomous Visualization Agents through Visual Perception-Driven Decision-Making
Shusen Liu
Haichao Miao
Zhimin Li
M. Olson
Valerio Pascucci
P. Bremer
290
15
0
07 Dec 2023
TokenCompose: Text-to-Image Diffusion with Token-level Supervision
Zirui Wang
Zhizhou Sha
Zheng Ding
Yilin Wang
Zhuowen Tu
DiffM
255
14
0
06 Dec 2023
Foundation Models for Weather and Climate Data Understanding: A Comprehensive Survey
Shengchao Chen
Guodong Long
Jing Jiang
Dikai Liu
Chengqi Zhang
SyDa
AI4CE
296
35
0
05 Dec 2023
Rejuvenating image-GPT as Strong Visual Representation Learners
International Conference on Machine Learning (ICML), 2023
Sucheng Ren
Zeyu Wang
Hongru Zhu
Junfei Xiao
Yaoyao Liu
Cihang Xie
VLM
254
11
0
04 Dec 2023
Cross-Modal Adaptive Dual Association for Text-to-Image Person Retrieval
IEEE transactions on multimedia (IEEE TMM), 2023
Dixuan Lin
Yi-Xing Peng
Jingke Meng
Wei-Shi Zheng
167
25
0
04 Dec 2023
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
European Conference on Computer Vision (ECCV), 2023
Feng Wang
Jieru Mei
Yaoyao Liu
VLM
329
116
0
04 Dec 2023
PixelLM: Pixel Reasoning with Large Multimodal Model
Computer Vision and Pattern Recognition (CVPR), 2023
Zhongwei Ren
Zhicheng Huang
Yunchao Wei
Yao-Min Zhao
Dongmei Fu
Jiashi Feng
Xiaojie Jin
VLM
MLLM
LRM
341
184
0
04 Dec 2023
How to Configure Good In-Context Sequence for Visual Question Answering
Computer Vision and Pattern Recognition (CVPR), 2023
Li Li
Jiawei Peng
Huiyi Chen
Chongyang Gao
Xu Yang
MLLM
210
36
0
04 Dec 2023
Behind the Magic, MERLIM: Multi-modal Evaluation Benchmark for Large Image-Language Models
Andrés Villa
Juan Carlos León Alcázar
Alvaro Soto
Bernard Ghanem
MLLM
VLM
256
18
0
03 Dec 2023
A Comprehensive Study of Vision Transformers in Image Classification Tasks
Mahmoud Khalil
Ahmad Khalil
A. Ngom
ViT
185
14
0
02 Dec 2023
Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
Computer Vision and Pattern Recognition (CVPR), 2023
Walid Bousselham
Felix Petersen
Vittorio Ferrari
Hilde Kuehne
ObjD
VLM
319
72
0
01 Dec 2023
Segment and Caption Anything
Computer Vision and Pattern Recognition (CVPR), 2023
Xiaoke Huang
Jianfeng Wang
Yansong Tang
Zheng Zhang
Han Hu
Jiwen Lu
Lijuan Wang
Zicheng Liu
MLLM
VLM
202
32
0
01 Dec 2023
Infrared Image Super-Resolution via GAN
Y. Huang
S. Omachi
GAN
263
0
0
01 Dec 2023
LightCLIP: Learning Multi-Level Interaction for Lightweight Vision-Language Models
Ying Nie
Wei He
Kai Han
Yehui Tang
Tianyu Guo
Fanyi Du
Yunhe Wang
VLM
200
5
0
01 Dec 2023
Vision-Language Models Learn Super Images for Efficient Partially Relevant Video Retrieval
ACM Transactions on Multimedia Computing, Communications, and Applications (TOMCCAP) (TOMM), 2023
Taichi Nishimura
Shota Nakada
Masayoshi Kondo
VLM
283
6
0
01 Dec 2023
Green Edge AI: A Contemporary Survey
Proceedings of the IEEE (Proc. IEEE), 2023
Yuyi Mao
X. Yu
Kaibin Huang
Ying-Jun Angela Zhang
Jun Zhang
358
51
0
01 Dec 2023
Brainformer: Mimic Human Visual Brain Functions to Machine Vision Models via fMRI
Xuan-Bac Nguyen
Pawan Sinha
Arabinda Kumar Choudhary
Samee U. Khan
Khoa Luu
ViT
MedIm
324
4
0
30 Nov 2023
MLLMs-Augmented Visual-Language Representation Learning
Yanqing Liu
Kai Wang
Wenqi Shao
Ping Luo
Yu Qiao
Mike Zheng Shou
Kaipeng Zhang
Yang You
VLM
224
19
0
30 Nov 2023
Synthesize, Diagnose, and Optimize: Towards Fine-Grained Vision-Language Understanding
Computer Vision and Pattern Recognition (CVPR), 2023
Wujian Peng
Sicheng Xie
Zuyao You
Shiyi Lan
Zuxuan Wu
VLM
CoGe
MLLM
486
39
0
30 Nov 2023
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan
Zitian Tang
Zhiqiu Yu
Chen Sun
451
2
0
30 Nov 2023
GELDA: A generative language annotation framework to reveal visual biases in datasets
Krish Kabra
Kathleen M. Lewis
Guha Balakrishnan
VLM
148
1
0
29 Nov 2023
CLIPC8: Face liveness detection algorithm based on image-text pairs and contrastive learning
Xu Liu
Shu Zhou
Yurong Song
Wenzhe Luo
Xin Zhang
132
2
0
29 Nov 2023
Contrastive Vision-Language Alignment Makes Efficient Instruction Learner
Lizhao Liu
Xinyu Sun
Tianhang Xiang
Zhuangwei Zhuang
Liuren Yin
Mingkui Tan
VLM
159
4
0
29 Nov 2023
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Computer Vision and Pattern Recognition (CVPR), 2023
Pavan Kumar Anasosalu Vasu
Hadi Pouransari
Fartash Faghri
Raviteja Vemulapalli
Oncel Tuzel
CLIP
VLM
567
81
0
28 Nov 2023
The curse of language biases in remote sensing VQA: the role of spatial attributes, language diversity, and the need for clear evaluation
Christel Chappuis
Eliot Walt
Vincent Mendez
Sylvain Lobry
B. L. Saux
D. Tuia
226
7
0
28 Nov 2023
Large Model Based Referring Camouflaged Object Detection
Shupeng Cheng
Ge-Peng Ji
Pengda Qin
Deng-Ping Fan
Bowen Zhou
Peng Xu
ObjD
230
13
0
28 Nov 2023
Emergent Open-Vocabulary Semantic Segmentation from Off-the-shelf Vision-Language Models
Computer Vision and Pattern Recognition (CVPR), 2023
Jiayun Luo
Siddhesh Khandelwal
Leonid Sigal
Boyang Albert Li
MLLM
VLM
550
11
0
28 Nov 2023
IG Captioner: Information Gain Captioners are Strong Zero-shot Classifiers
European Conference on Computer Vision (ECCV), 2023
Chenglin Yang
Siyuan Qiao
Yuan Cao
Yu Zhang
Tao Zhu
Yaoyao Liu
Jiahui Yu
VLM
145
3
0
27 Nov 2023
ViT-Lens: Towards Omni-modal Representations
Computer Vision and Pattern Recognition (CVPR), 2023
Weixian Lei
Yixiao Ge
Kun Yi
Jianfeng Zhang
Difei Gao
Dylan Sun
Yuying Ge
Ying Shan
Mike Zheng Shou
179
32
0
27 Nov 2023
MMMU: A Massive Multi-discipline Multimodal Understanding and Reasoning Benchmark for Expert AGI
Computer Vision and Pattern Recognition (CVPR), 2023
Xiang Yue
Yuansheng Ni
Kai Zhang
Tianyu Zheng
Ruoqi Liu
...
Yibo Liu
Wenhao Huang
Huan Sun
Yu-Chuan Su
Wenhu Chen
OSLM
ELM
VLM
841
1,575
0
27 Nov 2023
Efficient Pre-training for Localized Instruction Generation of Videos
Anil Batra
Davide Moltisanti
Laura Sevilla-Lara
Marcus Rohrbach
Frank Keller
356
0
0
27 Nov 2023
Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
Computer Vision and Pattern Recognition (CVPR), 2023
Yifei Chen
Dapeng Chen
Ruijin Liu
Sai Zhou
Wenyuan Xue
Wei Peng
240
15
0
27 Nov 2023
Previous
1
2
3
...
10
11
12
...
19
20
21
Next