Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2205.01917
Cited By
v1
v2 (latest)
CoCa: Contrastive Captioners are Image-Text Foundation Models
4 May 2022
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (3 upvotes)
Papers citing
"CoCa: Contrastive Captioners are Image-Text Foundation Models"
50 / 1,042 papers shown
With Great Backbones Comes Great Adversarial Transferability
Erik Arakelyan
Karen Hambardzumyan
Davit Papikyan
Pasquale Minervini
Albert Gordo
Isabelle Augenstein
Aram H. Markosyan
AAML
358
0
0
21 Jan 2025
BIOMEDICA: An Open Biomedical Image-Caption Archive, Dataset, and Vision-Language Models Derived from Scientific Literature
Computer Vision and Pattern Recognition (CVPR), 2025
Alejandro Lozano
Min Woo Sun
James Burgess
Liangyu Chen
Jeffrey Nirschl
...
Xiaohan Wang
Yuhui Zhang
Alfred Seunghoon Song
Robert Tibshirani
Serena Yeung-Levy
LM&MA
VLM
MedIm
468
23
0
13 Jan 2025
VLM2Vec: Training Vision-Language Models for Massive Multimodal Embedding Tasks
International Conference on Learning Representations (ICLR), 2024
Ziyan Jiang
Rui Meng
Xinyi Yang
Semih Yavuz
Yingbo Zhou
Lei Ma
MLLM
VLM
590
98
0
03 Jan 2025
Tuning Vision-Language Models with Candidate Labels by Prompt Alignment
Zhifang Zhang
Yuwei Niu
Xin Liu
Beibei Li
VPVLM
VLM
450
2
0
31 Dec 2024
Improving Generated and Retrieved Knowledge Combination Through Zero-shot Generation
Xinkai Du
Quanjie Han
Chao Lv
Yi Liu
Yalin Sun
Hao Shu
Hongbo Shan
Maosong Sun
RALM
368
2
0
25 Dec 2024
Cross-Modal Few-Shot Learning with Second-Order Neural Ordinary Differential Equations
AAAI Conference on Artificial Intelligence (AAAI), 2024
Yi Zhang
Chun-Wun Cheng
Junyi He
Zhihai He
Carola-Bibiane Schonlieb
Yuyan Chen
Angelica I Aviles-Rivero
AI4TS
321
0
0
20 Dec 2024
Defeasible Visual Entailment: Benchmark, Evaluator, and Reward-Driven Optimization
AAAI Conference on Artificial Intelligence (AAAI), 2024
Yue Zhang
Liqiang Jing
Vibhav Gogate
419
12
0
19 Dec 2024
Bringing Multimodality to Amazon Visual Search System
Knowledge Discovery and Data Mining (KDD), 2024
Xinliang Zhu
Michael Huang
Han Ding
Jinyu Yang
Kelvin Chen
...
Son Dinh Tran
Benjamin Z. Yao
Doug Gray
Anuj Bindal
Arnab Dhua
254
8
0
17 Dec 2024
CATSplat: Context-Aware Transformer with Spatial Guidance for Generalizable 3D Gaussian Splatting from A Single-View Image
Wonseok Roh
Hwanhee Jung
Jong Wook Kim
Seanie Lee
Innfarn Yoo
Andreas Lugmayr
Seunggeun Chi
K. Ramani
Sangpil Kim
3DGS
353
6
0
17 Dec 2024
LLMs are Also Effective Embedding Models: An In-depth Overview
Chongyang Tao
Tao Shen
Shen Gao
Junshuo Zhang
Zhen Li
Kai Hua
Wenpeng Hu
Zhengwei Tao
Shuai Ma
396
27
0
17 Dec 2024
CPath-Omni: A Unified Multimodal Foundation Model for Patch and Whole Slide Image Analysis in Computational Pathology
Computer Vision and Pattern Recognition (CVPR), 2024
Yuxuan Sun
Yixuan Si
Chenglu Zhu
Xuan Gong
Jianchao Tan
Pingyi Chen
Ye Zhang
Honglin Li
Tao Lin
Lin Yang
VLM
288
0
0
16 Dec 2024
SAMIC: Segment Anything with In-Context Spatial Prompt Engineering
S. Nagendra
Kashif Rashid
Chaopeng Shen
Daniel Kifer
VLM
327
4
0
16 Dec 2024
UCDR-Adapter: Exploring Adaptation of Pre-Trained Vision-Language Models for Universal Cross-Domain Retrieval
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Haoyu Jiang
Zhi-Qi Cheng
Gabriel Moreira
Jiawen Zhu
Yuxuan Zhou
Bukun Ren
Jun-Yan He
Jingdong Sun
Xian-Sheng Hua
VLM
352
3
0
14 Dec 2024
DiffCLIP: Few-shot Language-driven Multimodal Classifier
AAAI Conference on Artificial Intelligence (AAAI), 2024
Jiaqing Zhang
Mingxiang Cao
Xue Yang
Kai Jiang
Yunsong Li
VLM
263
2
0
10 Dec 2024
Visual Lexicon: Rich Image Features in Language Space
Computer Vision and Pattern Recognition (CVPR), 2024
Xudong Wang
Xingyi Zhou
Alireza Fathi
Trevor Darrell
Cordelia Schmid
VLM
208
7
0
09 Dec 2024
LLaVA-SpaceSGG: Visual Instruct Tuning for Open-vocabulary Scene Graph Generation with Enhanced Spatial Relations
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Mingjie Xu
Mengyang Wu
Yuzhi Zhao
Jason Chun Lok Li
Weifeng Ou
LRM
SyDa
VLM
292
10
0
09 Dec 2024
Unified Framework for Open-World Compositional Zero-shot Learning
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Hirunima Jayasekara
Khoi Pham
Nirat Saini
Abhinav Shrivastava
298
1
0
05 Dec 2024
FLAIR: VLM with Fine-grained Language-informed Image Representations
Computer Vision and Pattern Recognition (CVPR), 2024
Rui Xiao
Sanghwan Kim
Mariana-Iuliana Georgescu
Zeynep Akata
Stephan Alaniz
VLM
CLIP
312
20
0
04 Dec 2024
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Computer Vision and Pattern Recognition (CVPR), 2024
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
705
7
0
02 Dec 2024
CAREL: Instruction-guided reinforcement learning with cross-modal auxiliary objectives
Armin Saghafian
Amirmohammad Izadi
Negin Hashemi Dijujin
M. Baghshah
456
0
0
29 Nov 2024
Orthus: Autoregressive Interleaved Image-Text Generation with Modality-Specific Heads
Siqi Kou
Jiachun Jin
Chang Liu
Ye Ma
Jian Jia
Quan Chen
Peng Jiang
Zhijie Deng
Zhijie Deng
DiffM
VGen
VLM
604
28
0
28 Nov 2024
VLM-HOI: Vision Language Models for Interpretable Human-Object Interaction Analysis
Donggoo Kang
Dasol Jeong
Hyunmin Lee
Sangwoo Park
Hasil Park
Sunkyu Kwon
Yeongjoon Kim
Joonki Paik
MLLM
VLM
343
1
0
27 Nov 2024
Evaluating Vision-Language Models as Evaluators in Path Planning
Computer Vision and Pattern Recognition (CVPR), 2024
Mohamed Aghzal
Xiang Yue
Erion Plaku
Ziyu Yao
LRM
664
4
0
27 Nov 2024
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
Computer Vision and Pattern Recognition (CVPR), 2024
Yuhang Yang
Jinhong Deng
Wen Li
Lixin Duan
VLM
288
8
0
24 Nov 2024
Self-Calibrated CLIP for Training-Free Open-Vocabulary Segmentation
Sule Bai
Yong-Jin Liu
Yifei Han
Haoji Zhang
Yansong Tang
VLM
628
19
0
24 Nov 2024
Semantic Shield: Defending Vision-Language Models Against Backdooring and Poisoning via Fine-grained Knowledge Alignment
Computer Vision and Pattern Recognition (CVPR), 2024
Alvi Md Ishmam
Christopher Thomas
AAML
328
7
0
23 Nov 2024
OphCLIP: Hierarchical Retrieval-Augmented Learning for Ophthalmic Surgical Video-Language Pretraining
Ming Hu
Kun Yuan
Yaling Shen
Feilong Tang
Xiaohao Xu
...
Jin Ye
N. Padoy
Nassir Navab
Junjun He
Zongyuan Ge
VLM
CLIP
438
23
0
23 Nov 2024
Towards a Comprehensive Benchmark for Pathological Lymph Node Metastasis in Breast Cancer Sections
Xitong Ling
Yuanyuan Lei
Jiawen Li
Junru Cheng
Wenting Huang
Tian Guan
Jian Guan
Yonghong He
178
4
0
16 Nov 2024
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation
Yuheng Shi
Minjing Dong
Chang Xu
VLM
303
10
0
14 Nov 2024
SCAN: Bootstrapping Contrastive Pre-training for Data Efficiency
Yangyang Guo
Mohan S. Kankanhalli
VLM
69
3
0
14 Nov 2024
Which Viewpoint Shows it Best? Language for Weakly Supervising View Selection in Multi-view Instructional Videos
Computer Vision and Pattern Recognition (CVPR), 2024
Sagnik Majumder
Tushar Nagarajan
Ziad Al-Halah
Reina Pradhan
Kristen Grauman
424
0
0
13 Nov 2024
Hierarchical Visual Feature Aggregation for OCR-Free Document Understanding
Neural Information Processing Systems (NeurIPS), 2024
Jaeyoo Park
Jin Young Choi
Jeonghyung Park
Bohyung Han
VLM
141
8
0
08 Nov 2024
Classification Done Right for Vision-Language Pre-Training
Neural Information Processing Systems (NeurIPS), 2024
Zilong Huang
Qinghao Ye
Bingyi Kang
Jiashi Feng
Haoqi Fan
CLIP
VLM
419
7
0
05 Nov 2024
Domain Expansion and Boundary Growth for Open-Set Single-Source Domain Generalization
IEEE transactions on multimedia (IEEE TMM), 2024
Pengkun Jiao
Na Zhao
Yue Yu
Yu-Gang Jiang
OOD
318
3
0
05 Nov 2024
INQUIRE: A Natural World Text-to-Image Retrieval Benchmark
Neural Information Processing Systems (NeurIPS), 2024
Edward Vendrow
Omiros Pantazis
Alexander Shepard
Gabriel J. Brostow
Kate E. Jones
Oisin Mac Aodha
Sara Beery
Grant Van Horn
VLM
369
22
0
04 Nov 2024
Driving by the Rules: A Benchmark for Integrating Traffic Sign Regulations into Vectorized HD Map
Computer Vision and Pattern Recognition (CVPR), 2024
Xinyuan Chang
Maixuan Xue
Xinran Liu
Zheng Pan
Xing Wei
611
7
0
31 Oct 2024
EMMA: End-to-End Multimodal Model for Autonomous Driving
Jyh-Jing Hwang
Runsheng Xu
Hubert Lin
Wei-Chih Hung
Jingwei Ji
...
Benjamin Sapp
Yin Zhou
James Guo
Dragomir Anguelov
Mingxing Tan
VLM
LM&Ro
433
116
0
30 Oct 2024
AlphaChimp: Tracking and Behavior Recognition of Chimpanzees
Xiaoxuan Ma
Yutang Lin
Yuan Xu
Stephan P. Kaufhold
Jack Terwilliger
Andres Meza
Yixin Zhu
Federico Rossano
Yizhou Wang
449
4
0
22 Oct 2024
Beyond Filtering: Adaptive Image-Text Quality Enhancement for MLLM Pretraining
Han Huang
Yuqi Huo
Zijia Zhao
Haoyu Lu
Shu Wu
Bin Wang
Qiang Liu
Weipeng Chen
Shu Wu
VLM
183
2
0
21 Oct 2024
xGen-MM-Vid (BLIP-3-Video): You Only Need 32 Tokens to Represent a Video Even in VLMs
Michael S Ryoo
Honglu Zhou
Shrikant B. Kendre
Can Qin
Le Xue
...
Kanchana Ranasinghe
Caiming Xiong
Ran Xu
Caiming Xiong
Juan Carlos Niebles
VGen
310
26
0
21 Oct 2024
TIPS: Text-Image Pretraining with Spatial awareness
International Conference on Learning Representations (ICLR), 2024
Kevis-Kokitsi Maninis
Kaifeng Chen
Soham Ghosh
Arjun Karpur
Koert Chen
...
Jan Dlabal
Dan Gnanapragasam
Mojtaba Seyedhosseini
Howard Zhou
Andre Araujo
VLM
439
17
0
21 Oct 2024
Assistive AI for Augmenting Human Decision-making
Natabara Máté Gyöngyössy
Bernát Török
Csilla Farkas
Laura Lucaj
Attila Menyhárd
Krisztina Menyhárd-Balázs
András Simonyi
Patrick van der Smagt
Zsolt Ződi
András Lőrincz
306
0
0
18 Oct 2024
Dual Prototype Evolving for Test-Time Generalization of Vision-Language Models
Neural Information Processing Systems (NeurIPS), 2024
Ce Zhang
Simon Stepputtis
Katia Sycara
Yaqi Xie
VLM
261
23
0
16 Oct 2024
DRACO: A Denoising-Reconstruction Autoencoder for Cryo-EM
Neural Information Processing Systems (NeurIPS), 2024
Yingjun Shen
Haizhao Dai
Qihe Chen
Yan Zeng
Jiakai Zhang
Yuan Pei
Jingyi Yu
247
4
0
15 Oct 2024
Locality Alignment Improves Vision-Language Models
International Conference on Learning Representations (ICLR), 2024
Ian Covert
Tony Sun
James Zou
Tatsunori Hashimoto
VLM
592
11
0
14 Oct 2024
Mamba4Cast: Efficient Zero-Shot Time Series Forecasting with State Space Models
Sathya Kamesh Bhethanabhotla
Omar Swelam
Julien N. Siems
David Salinas
Katharina Eggensperger
Mamba
AI4TS
AI4CE
217
13
0
12 Oct 2024
Calibrated Cache Model for Few-Shot Vision-Language Model Adaptation
Kun Ding
Qiang Yu
Haojian Zhang
Gaofeng Meng
Shiming Xiang
VLM
198
2
0
11 Oct 2024
On a Hidden Property in Computational Imaging
Yinan Feng
Yinpeng Chen
Yueh Lee
Youzuo Lin
202
0
0
11 Oct 2024
LatteCLIP: Unsupervised CLIP Fine-Tuning via LMM-Synthetic Texts
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2024
Anh-Quan Cao
M. Jaritz
Matthieu Guillaumin
Raoul de Charette
Loris Bazzani
VLM
CLIP
350
4
0
10 Oct 2024
Evaluating Computational Pathology Foundation Models for Prostate Cancer Grading under Distribution Shifts
Fredrik K. Gustafsson
Mattias Rantalainen
OOD
MedIm
200
5
0
09 Oct 2024
Previous
1
2
3
4
5
6
...
19
20
21
Next