Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 8,869 papers shown
Title
The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training
Gi-Cheon Kang
Sungdong Kim
Jin-Hwa Kim
Donghyun Kwak
Byoung-Tak Zhang
16
10
0
25 May 2022
End-to-End Multimodal Fact-Checking and Explanation Generation: A Challenging Dataset and Models
Barry Menglong Yao
Aditya Shah
Lichao Sun
Jin-Hee Cho
Lifu Huang
MLLM
LRM
33
78
0
25 May 2022
ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts
Akari Asai
Mohammadreza Salehi
Matthew E. Peters
Hannaneh Hajishirzi
120
100
0
24 May 2022
On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization
Shruti Palaskar
Akshita Bhagia
Yonatan Bisk
Florian Metze
A. Black
Ana Marasović
14
4
0
24 May 2022
Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment
Tuan Dinh
Jy-yong Sohn
Shashank Rajput
Timothy Ossowski
Yifei Ming
Junjie Hu
Dimitris Papailiopoulos
Kangwook Lee
8
0
0
23 May 2022
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia
William Chan
Saurabh Saxena
Lala Li
Jay Whang
...
Raphael Gontijo-Lopes
Tim Salimans
Jonathan Ho
David J Fleet
Mohammad Norouzi
VLM
55
5,762
0
23 May 2022
Decoder Denoising Pretraining for Semantic Segmentation
Emmanuel B. Asiedu
Simon Kornblith
Ting Chen
Niki Parmar
Matthias Minderer
Mohammad Norouzi
AI4CE
180
26
0
23 May 2022
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models
Yuan Yao
Qi-An Chen
Ao Zhang
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
VLM
MLLM
21
38
0
23 May 2022
The developmental trajectory of object recognition robustness: children are like small adults but unlike big deep neural networks
Lukas Huber
Robert Geirhos
Felix Wichmann
46
16
0
20 May 2022
RankGen: Improving Text Generation with Large Ranking Models
Kalpesh Krishna
Yapei Chang
John Wieting
Mohit Iyyer
AIMat
16
68
0
19 May 2022
Voxel-informed Language Grounding
Rodolfo Corona
Shizhan Zhu
Dan Klein
Trevor Darrell
133
11
0
19 May 2022
TransTab: Learning Transferable Tabular Transformers Across Tables
Zifeng Wang
Jimeng Sun
LMTD
23
135
0
19 May 2022
Sparse Visual Counterfactual Explanations in Image Space
Valentyn Boreiko
Maximilian Augustin
Francesco Croce
Philipp Berens
Matthias Hein
BDL
CML
23
25
0
16 May 2022
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization
Luke Melas-Kyriazi
Christian Rupprecht
Iro Laina
Andrea Vedaldi
28
159
0
16 May 2022
Aligning Robot Representations with Humans
Andreea Bobu
Andi Peng
8
0
0
15 May 2022
Breaking with Fixed Set Pathology Recognition through Report-Guided Contrastive Training
C. Seibold
Simon Reiß
M. Sarfraz
Rainer Stiefelhagen
Jens Kleesiek
11
31
0
14 May 2022
A Comprehensive Survey of Few-shot Learning: Evolution, Applications, Challenges, and Opportunities
Yisheng Song
Ting-Yuan Wang
S. Mondal
J. P. Sahoo
SLR
36
342
0
13 May 2022
PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning
Hongbin Liu
Jinyuan Jia
Neil Zhenqiang Gong
25
34
0
13 May 2022
What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
Bryan Seybold
John F. Canny
26
6
0
12 May 2022
Simple Open-Vocabulary Object Detection with Vision Transformers
Matthias Minderer
A. Gritsenko
Austin Stone
Maxim Neumann
Dirk Weissenborn
...
Zhuoran Shen
Xiao Wang
Xiaohua Zhai
Thomas Kipf
N. Houlsby
ObjD
CLIP
VLM
ViT
OCL
8
304
0
12 May 2022
The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning
Zixin Wen
Yuanzhi Li
SSL
19
34
0
12 May 2022
Automated Audio Captioning: An Overview of Recent Progress and New Challenges
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
21
37
0
12 May 2022
Deep Learning and Synthetic Media
Raphaël Millière
13
18
0
11 May 2022
Learning to Retrieve Videos by Asking Questions
Avinash Madasu
Junier Oliva
Gedas Bertasius
VGen
25
15
0
11 May 2022
DISARM: Detecting the Victims Targeted by Harmful Memes
Shivam Sharma
Md. Shad Akhtar
Preslav Nakov
Tanmoy Chakraborty
11
29
0
11 May 2022
Learning to Answer Visual Questions from Web Videos
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
28
33
0
10 May 2022
Transformer-based Cross-Modal Recipe Embeddings with Large Batch Training
Jing Yang
Junwen Chen
Keiji Yanai
ViT
11
5
0
10 May 2022
Weakly-supervised segmentation of referring expressions
Robin Strudel
Ivan Laptev
Cordelia Schmid
19
21
0
10 May 2022
When does dough become a bagel? Analyzing the remaining mistakes on ImageNet
Vijay Vasudevan
Benjamin Caine
Raphael Gontijo-Lopes
Sara Fridovich-Keil
Rebecca Roelofs
VLM
UQCV
31
57
0
09 May 2022
Generating Representative Samples for Few-Shot Classification
Jingyi Xu
Hieu M. Le
VLM
9
61
0
05 May 2022
Relational Representation Learning in Visually-Rich Documents
Xin Li
Yan Zheng
Yiqing Hu
H. Cao
Yunfei Wu
Deqiang Jiang
Yinsong Liu
Bo Ren
16
12
0
05 May 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision
Henghui Zhao
Isma Hadji
Nikita Dvornik
Konstantinos G. Derpanis
Richard P. Wildes
Allan D. Jepson
16
45
0
04 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
57
1,253
0
04 May 2022
All You May Need for VQA are Image Captions
Soravit Changpinyo
Doron Kukliansky
Idan Szpektor
Xi Chen
Nan Ding
Radu Soricut
30
70
0
04 May 2022
i-Code: An Integrative and Composable Multimodal Learning Framework
Ziyi Yang
Yuwei Fang
Chenguang Zhu
Reid Pryzant
Dongdong Chen
...
Bin Xiao
Yuanxun Lu
Takuya Yoshioka
Michael Zeng
Xuedong Huang
40
45
0
03 May 2022
Comparison of CoModGANs, LaMa and GLIDE for Art Inpainting- Completing M.C Escher's Print Gallery
Lucia Cipolina-Kun
Simone Caenazzo
Gaston Mazzei
16
2
0
03 May 2022
Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)
Alex Fang
Gabriel Ilharco
Mitchell Wortsman
Yu Wan
Vaishaal Shankar
Achal Dave
Ludwig Schmidt
VLM
OOD
14
138
0
03 May 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
12
16
0
02 May 2022
Seeding Diversity into AI Art
Marvin Zammit
Antonios Liapis
Georgios N. Yannakakis
22
4
0
02 May 2022
Visual Spatial Reasoning
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
21
156
0
30 Apr 2022
CLIP-Art: Contrastive Pre-training for Fine-Grained Art Classification
Marcos V. Conde
Kerem Turgutlu
CLIP
VLM
28
94
0
29 Apr 2022
PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining
Yuting Gao
Jinfeng Liu
Zihan Xu
Jinchao Zhang
Ke Li
Rongrong Ji
Chunhua Shen
VLM
CLIP
17
100
0
29 Apr 2022
Leaner and Faster: Two-Stage Model Compression for Lightweight Text-Image Retrieval
Siyu Ren
Kenny Q. Zhu
VLM
22
7
0
29 Apr 2022
Vision-Language Pre-Training for Boosting Scene Text Detectors
Sibo Song
Jianqiang Wan
Zhibo Yang
Jun Tang
Wenqing Cheng
Xiang Bai
Cong Yao
VLM
31
24
0
29 Apr 2022
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
Ming Ding
Wendi Zheng
Wenyi Hong
Jie Tang
VLM
18
321
0
28 Apr 2022
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
Yuying Ge
Yixiao Ge
Xihui Liu
Alex Jinpeng Wang
Jianping Wu
Ying Shan
Xiaohu Qie
Ping Luo
VLM
9
43
0
26 Apr 2022
TEMOS: Generating diverse human motions from textual descriptions
Mathis Petrovich
Michael J. Black
Gül Varol
40
368
0
25 Apr 2022
Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?
Yuchen Cui
S. Niekum
Abhi Gupta
Vikash Kumar
Aravind Rajeswaran
LM&Ro
19
72
0
23 Apr 2022
Training and challenging models for text-guided fashion image retrieval
Eric Dodds
Jack Culpepper
Gaurav Srivastava
14
8
0
23 Apr 2022
A Taxonomy of Prompt Modifiers for Text-To-Image Generation
J. Oppenlaender
15
102
0
20 Apr 2022
Previous
1
2
3
...
170
171
172
...
176
177
178
Next