Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 8,339 papers shown
Title
Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model
Zipeng Xu
Tianwei Lin
Hao Tang
Fu Li
Dongliang He
N. Sebe
Radu Timofte
Luc Van Gool
Errui Ding
EGVM
23
41
0
26 Nov 2021
Domain Prompt Learning for Efficiently Adapting CLIP to Unseen Domains
X. Zhang
S. Gu
Yutaka Matsuo
Yusuke Iwasawa
VLM
30
36
0
25 Nov 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
W. Wang
Lijuan Wang
Zicheng Liu
VLM
34
215
0
24 Nov 2021
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Chenfei Wu
Jian Liang
Lei Ji
Fan Yang
Yuejian Fang
Daxin Jiang
Nan Duan
ViT
VGen
14
292
0
24 Nov 2021
MaIL: A Unified Mask-Image-Language Trimodal Network for Referring Image Segmentation
Zizhang Li
Mengmeng Wang
Jianbiao Mei
Yong Liu
8
18
0
21 Nov 2021
Swin Transformer V2: Scaling Up Capacity and Resolution
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
...
Yue Cao
Zheng-Wei Zhang
Li Dong
Furu Wei
B. Guo
ViT
41
1,713
0
18 Nov 2021
One-Shot Generative Domain Adaptation
Ceyuan Yang
Yujun Shen
Zhiyi Zhang
Yinghao Xu
Jiapeng Zhu
Zhirong Wu
Bolei Zhou
13
49
0
18 Nov 2021
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Yaya Shi
Xu Yang
Haiyang Xu
Chunfen Yuan
Bing Li
Weiming Hu
Zhengjun Zha
31
33
0
17 Nov 2021
Scaling Law for Recommendation Models: Towards General-purpose User Representations
Kyuyong Shin
Hanock Kwak
KyungHyun Kim
Max Nihlén Ramström
Jisu Jeong
Jung-Woo Ha
S. Kim
ELM
18
38
0
15 Nov 2021
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGS
ViT
66
325
0
11 Nov 2021
Evolving Evocative 2D Views of Generated 3D Objects
Eric Chu
13
4
0
08 Nov 2021
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
Renrui Zhang
Rongyao Fang
Wei Zhang
Peng Gao
Kunchang Li
Jifeng Dai
Yu Qiao
Hongsheng Li
VLM
184
384
0
06 Nov 2021
Visualizing the Emergence of Intermediate Visual Patterns in DNNs
Mingjie Li
Shaobo Wang
Quanshi Zhang
8
11
0
05 Nov 2021
Projected GANs Converge Faster
Axel Sauer
Kashyap Chitta
Jens Muller
Andreas Geiger
15
234
0
01 Nov 2021
Qu-ANTI-zation: Exploiting Quantization Artifacts for Achieving Adversarial Outcomes
Sanghyun Hong
Michael-Andrei Panaitescu-Liess
Yigitcan Kaya
Tudor Dumitras
MQ
39
13
0
26 Oct 2021
Image-Based CLIP-Guided Essence Transfer
Hila Chefer
Sagie Benaim
Roni Paiss
Lior Wolf
CLIP
16
50
0
24 Oct 2021
Wav2CLIP: Learning Robust Audio Representations From CLIP
Ho-Hsiang Wu
Prem Seetharaman
Kundan Kumar
J. P. Bello
CLIP
VLM
11
267
0
21 Oct 2021
MixNorm: Test-Time Adaptation Through Online Normalization Estimation
Xuefeng Hu
M. Uzunbas
Sirius Chen
Rui Wang
Ashish Shah
Ram Nevatia
Ser-Nam Lim
TTA
12
47
0
21 Oct 2021
Self-Initiated Open World Learning for Autonomous AI Agents
Bing-Quan Liu
Eric Robertson
Scott Grigsby
Sahisnu Mazumder
AI4CE
22
8
0
21 Oct 2021
Generalized Out-of-Distribution Detection: A Survey
Jingkang Yang
Kaiyang Zhou
Yixuan Li
Ziwei Liu
171
870
0
21 Oct 2021
No One Representation to Rule Them All: Overlapping Features of Training Methods
Raphael Gontijo-Lopes
Yann N. Dauphin
E. D. Cubuk
15
58
0
20 Oct 2021
BEAMetrics: A Benchmark for Language Generation Evaluation Evaluation
Thomas Scialom
Felix Hill
12
7
0
18 Oct 2021
Multimodal Dialogue Response Generation
Qingfeng Sun
Yujing Wang
Can Xu
Kai Zheng
Yaming Yang
Huang Hu
Fei Xu
Jessica Zhang
Xiubo Geng
Daxin Jiang
15
43
0
16 Oct 2021
A CLIP-Enhanced Method for Video-Language Understanding
Guohao Li
Feng He
Zhifan Feng
CLIP
22
12
0
14 Oct 2021
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
Peng Gao
Shijie Geng
Renrui Zhang
Teli Ma
Rongyao Fang
Yongfeng Zhang
Hongsheng Li
Yu Qiao
VLM
CLIP
51
974
0
09 Oct 2021
DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
Gwanghyun Kim
Taesung Kwon
Jong Chul Ye
DiffM
14
621
0
06 Oct 2021
Objects in Semantic Topology
Shuo Yang
Pei Sun
Yi-Xin Jiang
Xiaobo Xia
Ruiheng Zhang
Zehuan Yuan
Changhu Wang
Ping Luo
Min Xu
ObjD
80
29
0
06 Oct 2021
CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation
Aditya Sanghi
Hang Chu
Joseph G. Lambourne
Ye Wang
Chin-Yi Cheng
Marco Fumero
Kamal Rahimi Malekshan
CLIP
33
287
0
06 Oct 2021
Exploring the Limits of Large Scale Pre-training
Samira Abnar
Mostafa Dehghani
Behnam Neyshabur
Hanie Sedghi
AI4CE
50
114
0
05 Oct 2021
An animated picture says at least a thousand words: Selecting Gif-based Replies in Multimodal Dialog
Xingyao Wang
David Jurgens
17
5
0
24 Sep 2021
Dense Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Gao
Zuohui Fu
Gerard de Melo
Yunpeng Chen
Sen Su
VLM
SSL
50
10
0
24 Sep 2021
EfficientCLIP: Efficient Cross-Modal Pre-training by Ensemble Confident Learning and Language Modeling
Jue Wang
Haofan Wang
Jincan Deng
Weijia Wu
Debing Zhang
VLM
CLIP
57
18
0
10 Sep 2021
EVOQUER: Enhancing Temporal Grounding with Video-Pivoted BackQuery Generation
Yanjun Gao
Lulu Liu
Jason Wang
Xin Chen
Huayan Wang
Rui Zhang
20
1
0
10 Sep 2021
M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining
Xiao Dong
Xunlin Zhan
Yangxin Wu
Yunchao Wei
Michael C. Kampffmeyer
Xiaoyong Wei
Minlong Lu
Yaowei Wang
Xiaodan Liang
25
36
0
09 Sep 2021
Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models
Steven Y. Feng
Kevin Lu
Zhuofu Tao
Malihe Alikhani
Teruko Mitamura
Eduard H. Hovy
Varun Gangal
LRM
25
13
0
08 Sep 2021
Robust fine-tuning of zero-shot models
Mitchell Wortsman
Gabriel Ilharco
Jong Wook Kim
Mike Li
Simon Kornblith
...
Raphael Gontijo-Lopes
Hannaneh Hajishirzi
Ali Farhadi
Hongseok Namkoong
Ludwig Schmidt
VLM
19
679
0
04 Sep 2021
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
322
2,249
0
02 Sep 2021
Fine-Grained Chemical Entity Typing with Multimodal Knowledge Representation
Chenkai Sun
Weijian Li
Jinfeng Xiao
Nikolaus Nova Parulian
ChengXiang Zhai
Heng Ji
34
4
0
29 Aug 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLM
MLLM
49
766
0
24 Aug 2021
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition
Jiawei Chen
C. Ho
ViT
19
76
0
20 Aug 2021
Contrastive Language-Image Pre-training for the Italian Language
Federico Bianchi
Giuseppe Attanasio
Raphael Pisoni
Silvia Terragni
Gabriele Sarti
S. Lakshmi
VLM
CLIP
16
29
0
19 Aug 2021
Billion-Scale Pretraining with Vision Transformers for Multi-Task Visual Representations
Josh Beal
Hao Wu
Dong Huk Park
Andrew Zhai
Dmitry Kislyuk
ViT
11
29
0
12 Aug 2021
SoK: How Robust is Image Classification Deep Neural Network Watermarking? (Extended Version)
Nils Lukas
Edward Jiang
Xinda Li
Florian Kerschbaum
AAML
20
85
0
11 Aug 2021
Knowledge accumulating: The general pattern of learning
Zhuoran Xu
Hao Liu
CLL
10
0
0
09 Aug 2021
BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning
Jinyuan Jia
Yupei Liu
Neil Zhenqiang Gong
SILM
SSL
8
150
0
01 Aug 2021
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining
Xunlin Zhan
Yangxin Wu
Xiao Dong
Yunchao Wei
Minlong Lu
Yichi Zhang
Hang Xu
Xiaodan Liang
ViT
13
64
0
30 Jul 2021
Pointer Value Retrieval: A new benchmark for understanding the limits of neural network generalization
Chiyuan Zhang
M. Raghu
Jon M. Kleinberg
Samy Bengio
OOD
11
30
0
27 Jul 2021
Theoretical foundations and limits of word embeddings: what types of meaning can they capture?
Alina Arseniev-Koehler
23
18
0
22 Jul 2021
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries
Jie Lei
Tamara L. Berg
Mohit Bansal
ViT
14
62
0
20 Jul 2021
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq R. Joty
Caiming Xiong
S. Hoi
FaML
19
1,876
0
16 Jul 2021
Previous
1
2
3
...
165
166
167
Next