Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 8,928 papers shown
Title
Simple Open-Vocabulary Object Detection with Vision Transformers
Matthias Minderer
A. Gritsenko
Austin Stone
Maxim Neumann
Dirk Weissenborn
...
Zhuoran Shen
Xiao Wang
Xiaohua Zhai
Thomas Kipf
N. Houlsby
ObjD
CLIP
VLM
ViT
OCL
8
306
0
12 May 2022
The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning
Zixin Wen
Yuanzhi Li
SSL
19
34
0
12 May 2022
Automated Audio Captioning: An Overview of Recent Progress and New Challenges
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
24
37
0
12 May 2022
Deep Learning and Synthetic Media
Raphaël Millière
18
18
0
11 May 2022
Learning to Retrieve Videos by Asking Questions
Avinash Madasu
Junier Oliva
Gedas Bertasius
VGen
30
15
0
11 May 2022
DISARM: Detecting the Victims Targeted by Harmful Memes
Shivam Sharma
Md. Shad Akhtar
Preslav Nakov
Tanmoy Chakraborty
11
29
0
11 May 2022
Learning to Answer Visual Questions from Web Videos
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
28
33
0
10 May 2022
Transformer-based Cross-Modal Recipe Embeddings with Large Batch Training
Jing Yang
Junwen Chen
Keiji Yanai
ViT
13
5
0
10 May 2022
Weakly-supervised segmentation of referring expressions
Robin Strudel
Ivan Laptev
Cordelia Schmid
19
21
0
10 May 2022
When does dough become a bagel? Analyzing the remaining mistakes on ImageNet
Vijay Vasudevan
Benjamin Caine
Raphael Gontijo-Lopes
Sara Fridovich-Keil
Rebecca Roelofs
VLM
UQCV
31
57
0
09 May 2022
Generating Representative Samples for Few-Shot Classification
Jingyi Xu
Hieu M. Le
VLM
9
61
0
05 May 2022
Relational Representation Learning in Visually-Rich Documents
Xin Li
Yan Zheng
Yiqing Hu
H. Cao
Yunfei Wu
Deqiang Jiang
Yinsong Liu
Bo Ren
16
12
0
05 May 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision
Henghui Zhao
Isma Hadji
Nikita Dvornik
Konstantinos G. Derpanis
Richard P. Wildes
Allan D. Jepson
20
45
0
04 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
57
1,253
0
04 May 2022
All You May Need for VQA are Image Captions
Soravit Changpinyo
Doron Kukliansky
Idan Szpektor
Xi Chen
Nan Ding
Radu Soricut
30
70
0
04 May 2022
i-Code: An Integrative and Composable Multimodal Learning Framework
Ziyi Yang
Yuwei Fang
Chenguang Zhu
Reid Pryzant
Dongdong Chen
...
Bin Xiao
Yuanxun Lu
Takuya Yoshioka
Michael Zeng
Xuedong Huang
40
45
0
03 May 2022
Comparison of CoModGANs, LaMa and GLIDE for Art Inpainting- Completing M.C Escher's Print Gallery
Lucia Cipolina-Kun
Simone Caenazzo
Gaston Mazzei
19
2
0
03 May 2022
Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)
Alex Fang
Gabriel Ilharco
Mitchell Wortsman
Yu Wan
Vaishaal Shankar
Achal Dave
Ludwig Schmidt
VLM
OOD
20
138
0
03 May 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
17
16
0
02 May 2022
Seeding Diversity into AI Art
Marvin Zammit
Antonios Liapis
Georgios N. Yannakakis
22
4
0
02 May 2022
Visual Spatial Reasoning
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
21
156
0
30 Apr 2022
CLIP-Art: Contrastive Pre-training for Fine-Grained Art Classification
Marcos V. Conde
Kerem Turgutlu
CLIP
VLM
28
94
0
29 Apr 2022
PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining
Yuting Gao
Jinfeng Liu
Zihan Xu
Jinchao Zhang
Ke Li
Rongrong Ji
Chunhua Shen
VLM
CLIP
25
100
0
29 Apr 2022
Leaner and Faster: Two-Stage Model Compression for Lightweight Text-Image Retrieval
Siyu Ren
Kenny Q. Zhu
VLM
22
7
0
29 Apr 2022
Vision-Language Pre-Training for Boosting Scene Text Detectors
Sibo Song
Jianqiang Wan
Zhibo Yang
Jun Tang
Wenqing Cheng
Xiang Bai
Cong Yao
VLM
34
24
0
29 Apr 2022
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
Ming Ding
Wendi Zheng
Wenyi Hong
Jie Tang
VLM
18
321
0
28 Apr 2022
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
Yuying Ge
Yixiao Ge
Xihui Liu
Alex Jinpeng Wang
Jianping Wu
Ying Shan
Xiaohu Qie
Ping Luo
VLM
9
43
0
26 Apr 2022
TEMOS: Generating diverse human motions from textual descriptions
Mathis Petrovich
Michael J. Black
Gül Varol
40
368
0
25 Apr 2022
Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?
Yuchen Cui
S. Niekum
Abhi Gupta
Vikash Kumar
Aravind Rajeswaran
LM&Ro
19
72
0
23 Apr 2022
Training and challenging models for text-guided fashion image retrieval
Eric Dodds
Jack Culpepper
Gaurav Srivastava
14
8
0
23 Apr 2022
A Taxonomy of Prompt Modifiers for Text-To-Image Generation
J. Oppenlaender
15
102
0
20 Apr 2022
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance
Katherine Crowson
Stella Biderman
Daniel Kornis
Dashiell Stander
Eric Hallahan
Louis Castricato
Edward Raff
CLIP
57
367
0
18 Apr 2022
Empirical Evaluation and Theoretical Analysis for Representation Learning: A Survey
Kento Nozawa
Issei Sato
AI4TS
14
4
0
18 Apr 2022
Simultaneous Multiple-Prompt Guided Generation Using Differentiable Optimal Transport
Yingtao Tian
Marco Cuturi
David R Ha
DiffM
OT
35
1
0
18 Apr 2022
StyleT2F: Generating Human Faces from Textual Description Using StyleGAN2
Mohamed Shawky Sabae
Mohamed Ahmed Dardir
Remonda Talaat Eskarous
M. Ebbed
CVBM
14
2
0
17 Apr 2022
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval
Haoyu Lu
Nanyi Fei
Yuqi Huo
Yizhao Gao
Zhiwu Lu
Jiaxin Wen
CLIP
VLM
19
54
0
15 Apr 2022
BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
N. Harada
K. Kashino
SSL
34
53
0
15 Apr 2022
Vision-and-Language Pretrained Models: A Survey
Siqu Long
Feiqi Cao
S. Han
Haiqing Yang
VLM
16
63
0
15 Apr 2022
WikiDiverse: A Multimodal Entity Linking Dataset with Diversified Contextual Topics and Entity Types
Xuwu Wang
Junfeng Tian
Min Gui
Zhixu Li
Rui-cang Wang
Ming Yan
Lihan Chen
Yanghua Xiao
VGen
24
48
0
13 Apr 2022
What Matters in Language Conditioned Robotic Imitation Learning over Unstructured Data
Oier Mees
Lukás Hermann
Wolfram Burgard
LM&Ro
28
149
0
13 Apr 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
67
6,622
0
13 Apr 2022
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension
Sanjay Subramanian
William Merrill
Trevor Darrell
Matt Gardner
Sameer Singh
Anna Rohrbach
ObjD
19
123
0
12 Apr 2022
MuCoT: Multilingual Contrastive Training for Question-Answering in Low-resource Languages
Gokul Karthik Kumar
Abhishek Singh Gehlot
Sahal Shaji Mullappilly
Karthik Nandakumar
21
13
0
12 Apr 2022
Text-Driven Separation of Arbitrary Sounds
Kevin Kilgour
Beat Gfeller
Qingqing Huang
A. Jansen
Scott Wisdom
Marco Tagliasacchi
22
30
0
12 Apr 2022
CLMLF:A Contrastive Learning and Multi-Layer Fusion Method for Multimodal Sentiment Detection
Zhen Li
Bing Xu
Conghui Zhu
T. Zhao
36
70
0
12 Apr 2022
Are Multimodal Transformers Robust to Missing Modality?
Mengmeng Ma
Jian Ren
Long Zhao
Davide Testuggine
Xi Peng
ViT
26
146
0
12 Apr 2022
XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font Generation
Wei Liu
Fangyue Liu
Fei Din
Qian He
Zili Yi
VLM
14
36
0
11 Apr 2022
No Token Left Behind: Explainability-Aided Image Classification and Generation
Roni Paiss
Hila Chefer
Lior Wolf
VLM
26
29
0
11 Apr 2022
Robust Cross-Modal Representation Learning with Progressive Self-Distillation
A. Andonian
Shixing Chen
Raffay Hamid
VLM
17
55
0
10 Apr 2022
Semantic Exploration from Language Abstractions and Pretrained Representations
Allison C. Tam
Neil C. Rabinowitz
Andrew Kyle Lampinen
Nicholas A. Roy
Stephanie C. Y. Chan
D. Strouse
Jane X. Wang
Andrea Banino
Felix Hill
LM&Ro
13
67
0
08 Apr 2022
Previous
1
2
3
...
171
172
173
...
177
178
179
Next