Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.09734
Cited By
ClipCap: CLIP Prefix for Image Captioning
18 November 2021
Ron Mokady
Amir Hertz
Amit H. Bermano
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ClipCap: CLIP Prefix for Image Captioning"
37 / 87 papers shown
Title
Extending CLIP's Image-Text Alignment to Referring Image Segmentation
Seoyeon Kim
Minguk Kang
Dongwon Kim
Jaesik Park
Suha Kwak
VLM
12
10
0
14 Jun 2023
Scalable Performance Analysis for Vision-Language Models
Santiago Castro
Oana Ignat
Rada Mihalcea
VLM
19
1
0
30 May 2023
Generalizable Synthetic Image Detection via Language-guided Contrastive Learning
Haiwei Wu
Jiantao Zhou
Shile Zhang
108
27
0
23 May 2023
DiffCap: Exploring Continuous Diffusion on Image Captioning
Yufeng He
Zefan Cai
Xu Gan
Baobao Chang
DiffM
21
5
0
20 May 2023
Pengi: An Audio Language Model for Audio Tasks
Soham Deshmukh
Benjamin Elizalde
Rita Singh
Huaming Wang
MLLM
AuLLM
30
155
0
19 May 2023
OpenShape: Scaling Up 3D Shape Representation Towards Open-World Understanding
Minghua Liu
Ruoxi Shi
Kaiming Kuang
Yinhao Zhu
Xuanlin Li
Shizhong Han
H. Cai
Fatih Porikli
Hao Su
3DPC
22
115
0
18 May 2023
VPGTrans: Transfer Visual Prompt Generator across LLMs
Ao Zhang
Hao Fei
Yuan Yao
Wei Ji
Li Li
Zhiyuan Liu
Tat-Seng Chua
MLLM
VLM
18
85
0
02 May 2023
Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
Noa Garcia
Yusuke Hirota
Yankun Wu
Yuta Nakashima
EGVM
20
50
0
06 Apr 2023
AVFormer: Injecting Vision into Frozen Speech Models for Zero-Shot AV-ASR
Paul Hongsuck Seo
Arsha Nagrani
Cordelia Schmid
27
15
0
29 Mar 2023
LLaMA-Adapter: Efficient Fine-tuning of Language Models with Zero-init Attention
Renrui Zhang
Jiaming Han
Chris Liu
Peng Gao
Aojun Zhou
Xiangfei Hu
Shilin Yan
Pan Lu
Hongsheng Li
Yu Qiao
MLLM
23
736
0
28 Mar 2023
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges
Maria Lymperaiou
Giorgos Stamou
VLM
24
4
0
04 Mar 2023
Prompt Stealing Attacks Against Text-to-Image Generation Models
Xinyue Shen
Y. Qu
Michael Backes
Yang Zhang
17
31
0
20 Feb 2023
Prompting for Multimodal Hateful Meme Classification
Rui Cao
Roy Ka-Wei Lee
Wen-Haw Chong
Jing Jiang
VLM
17
74
0
08 Feb 2023
Eliminating Contextual Prior Bias for Semantic Image Editing via Dual-Cycle Diffusion
Zuopeng Yang
Tianshu Chu
Xin Lin
Erdun Gao
Daqing Liu
J. Yang
Chaoyue Wang
DiffM
15
16
0
05 Feb 2023
IC3: Image Captioning by Committee Consensus
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
19
17
0
02 Feb 2023
Joint Representation Learning for Text and 3D Point Cloud
Rui Huang
Xuran Pan
Henry Zheng
Haojun Jiang
Zhifeng Xie
S. Song
Gao Huang
13
19
0
18 Jan 2023
LidarCLIP or: How I Learned to Talk to Point Clouds
Georg Hess
Adam Tonderski
Christoffer Petersson
Kalle AAstrom
Lennart Svensson
DiffM
19
22
0
13 Dec 2022
I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification
Muhammad Ferjad Naeem
Muhammad Gul Zain Ali Khan
Yongqin Xian
Muhammad Zeshan Afzal
D. Stricker
Luc Van Gool
F. Tombari
VLM
22
51
0
05 Dec 2022
UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding
Dave Zhenyu Chen
Ronghang Hu
Xinlei Chen
Matthias Nießner
Angel X. Chang
17
52
0
01 Dec 2022
Feedback is Needed for Retakes: An Explainable Poor Image Notification Framework for the Visually Impaired
Kazuya Ohata
Shunsuke Kitada
Hitoshi Iyatomi
14
0
0
17 Nov 2022
3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows
Vivian Liu
Jo Vermeulen
G. Fitzmaurice
Justin Matejka
HAI
25
116
0
20 Oct 2022
Composing Ensembles of Pre-trained Models via Iterative Consensus
Shuang Li
Yilun Du
J. Tenenbaum
Antonio Torralba
Igor Mordatch
MoMe
19
23
0
20 Oct 2022
Communication breakdown: On the low mutual intelligibility between human and neural captioning
Roberto Dessì
Eleonora Gualdoni
Francesca Franzon
Gemma Boleda
Marco Baroni
VLM
13
6
0
20 Oct 2022
Content-Based Search for Deep Generative Models
Daohan Lu
Sheng-Yu Wang
Nupur Kumari
Rohan Agarwal
Mia Tang
David Bau
Jun-Yan Zhu
DiffM
SyDa
30
5
0
06 Oct 2022
Every picture tells a story: Image-grounded controllable stylistic story generation
Holy Lovenia
Bryan Wilie
Romain Barraud
Samuel Cahyawijaya
Willy Chung
Pascale Fung
8
8
0
04 Sep 2022
Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
Thao Nguyen
Gabriel Ilharco
Mitchell Wortsman
Sewoong Oh
Ludwig Schmidt
CLIP
VLM
27
97
0
10 Aug 2022
Masked Vision and Language Modeling for Multi-modal Representation Learning
Gukyeong Kwon
Zhaowei Cai
Avinash Ravichandran
Erhan Bas
Rahul Bhotika
Stefano Soatto
22
66
0
03 Aug 2022
Semantic Abstraction: Open-World 3D Scene Understanding from 2D Vision-Language Models
Huy Ha
Shuran Song
LM&Ro
VLM
28
101
0
23 Jul 2022
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
87
93
0
04 Jul 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
19
225
0
16 Jun 2022
Seeding Diversity into AI Art
Marvin Zammit
Antonios Liapis
Georgios N. Yannakakis
22
4
0
02 May 2022
Large-scale Bilingual Language-Image Contrastive Learning
ByungSoo Ko
Geonmo Gu
VLM
17
14
0
28 Mar 2022
Democratizing Contrastive Language-Image Pre-training: A CLIP Benchmark of Data, Model, and Supervision
Yufeng Cui
Lichen Zhao
Feng Liang
Yangguang Li
Jing Shao
UQCV
VLM
CLIP
17
43
0
11 Mar 2022
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
8
88
0
31 Jan 2022
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Yoad Tewel
Yoav Shalev
Idan Schwartz
Lior Wolf
VLM
30
191
0
29 Nov 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
53
244
0
14 Jul 2021
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
250
922
0
24 Sep 2019
Previous
1
2