Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 8,339 papers shown
Title
X-Pool: Cross-Modal Language-Video Attention for Text-Video Retrieval
S. Gorti
Noël Vouitsis
Junwei Ma
Keyvan Golestan
M. Volkovs
Animesh Garg
Guangwei Yu
25
148
0
28 Mar 2022
Large-scale Bilingual Language-Image Contrastive Learning
ByungSoo Ko
Geonmo Gu
VLM
14
14
0
28 Mar 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Zaid Khan
B. Vijaykumar
Xiang Yu
S. Schulter
Manmohan Chandraker
Y. Fu
CLIP
VLM
20
16
0
27 Mar 2022
Diagonal State Spaces are as Effective as Structured State Spaces
Ankit Gupta
Albert Gu
Jonathan Berant
34
288
0
27 Mar 2022
GEN-VLKT: Simplify Association and Enhance Interaction Understanding for HOI Detection
Yue Liao
Aixi Zhang
Miao Lu
Yongliang Wang
Xiaobo Li
Si Liu
VLM
22
124
0
26 Mar 2022
Reshaping Robot Trajectories Using Natural Language Commands: A Study of Multi-Modal Data Alignment Using Transformers
A. Bucker
Luis F. C. Figueredo
Sami Haddadin
Ashish Kapoor
Shuang Ma
Rogerio Bonatti
LM&Ro
14
49
0
25 Mar 2022
CLIP-Mesh: Generating textured meshes from text using pretrained image-text models
N. Khalid
Tianhao Xie
Eugene Belilovsky
Tiberiu Popa
CLIP
6
291
0
24 Mar 2022
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Oran Gafni
Adam Polyak
Oron Ashual
Shelly Sheynin
Devi Parikh
Yaniv Taigman
DiffM
17
507
0
24 Mar 2022
Open-Vocabulary DETR with Conditional Matching
Yuhang Zang
Wei Li
Kaiyang Zhou
Chen Huang
Chen Change Loy
ObjD
VLM
4
196
0
22 Mar 2022
Look for the Change: Learning Object States and State-Modifying Actions from Untrimmed Web Videos
Tomávs Souvcek
Jean-Baptiste Alayrac
Antoine Miech
Ivan Laptev
Josef Sivic
19
32
0
22 Mar 2022
WuDaoMM: A large-scale Multi-Modal Dataset for Pre-training models
Shan Yuan
Shuai Zhao
Jiahong Leng
Zhao Xue
Hanyu Zhao
Peiyu Liu
Zheng Gong
Wayne Xin Zhao
Junyi Li
Tang Jie
VLM
19
5
0
22 Mar 2022
Domain Generalization by Mutual-Information Regularization with Pre-trained Models
Junbum Cha
Kyungjae Lee
Sungrae Park
Sanghyuk Chun
OOD
15
131
0
21 Mar 2022
Compression of Generative Pre-trained Language Models via Quantization
Chaofan Tao
Lu Hou
Wei Zhang
Lifeng Shang
Xin Jiang
Qun Liu
Ping Luo
Ngai Wong
MQ
22
103
0
21 Mar 2022
Open-Vocabulary One-Stage Detection with Hierarchical Visual-Language Knowledge Distillation
Zongyang Ma
Guan Luo
Jin Gao
Liang Li
Yuxin Chen
Shaoru Wang
Congxuan Zhang
Weiming Hu
VLM
ObjD
72
81
0
20 Mar 2022
Iwin: Human-Object Interaction Detection via Transformer with Irregular Windows
Danyang Tu
Xiongkuo Min
Huiyu Duan
G. Guo
Guangtao Zhai
Wei Shen
ViT
22
24
0
20 Mar 2022
CoWs on Pasture: Baselines and Benchmarks for Language-Driven Zero-Shot Object Navigation
S. Gadre
Mitchell Wortsman
Gabriel Ilharco
Ludwig Schmidt
Shuran Song
CLIP
LM&Ro
25
140
0
20 Mar 2022
BrainGB: A Benchmark for Brain Network Analysis with Graph Neural Networks
Hejie Cui
Wei Dai
Yanqiao Zhu
Xuan Kan
Antonio Aodong Chen Gu
Joshua Lukemire
Liang Zhan
Lifang He
Ying Guo
Carl Yang
6
110
0
17 Mar 2022
X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation
Yinan He
Gengshi Huang
Siyu Chen
Jianing Teng
Wang Kun
Zhen-fei Yin
Lu Sheng
Ziwei Liu
Yu Qiao
Jing Shao
VLM
SSL
ViT
22
7
0
16 Mar 2022
Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding
Haojun Jiang
Yuanze Lin
Dongchen Han
Shiji Song
Gao Huang
ObjD
33
49
0
16 Mar 2022
Bamboo: Building Mega-Scale Vision Dataset Continually with Human-Machine Synergy
Yuanhan Zhang
Qi Sun
Yichun Zhou
Zexin He
Zhen-fei Yin
Kunze Wang
Lu Sheng
Yu Qiao
Jing Shao
Ziwei Liu
ObjD
VLM
11
19
0
15 Mar 2022
Disentangled Representation Learning for Text-Video Retrieval
Qiang Wang
Yanhao Zhang
Yun Zheng
Pan Pan
Xiansheng Hua
45
76
0
14 Mar 2022
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
Alexander Kunitsyn
M. Kalashnikov
Maksim Dzabraev
Andrei Ivaniuta
19
16
0
14 Mar 2022
Democratizing Contrastive Language-Image Pre-training: A CLIP Benchmark of Data, Model, and Supervision
Yufeng Cui
Lichen Zhao
Feng Liang
Yangguang Li
Jing Shao
UQCV
VLM
CLIP
17
43
0
11 Mar 2022
The Overlooked Classifier in Human-Object Interaction Recognition
Ying Jin
Yinpeng Chen
Lijuan Wang
Jianfeng Wang
Pei Yu
Lin Liang
Jenq-Neng Hwang
Zicheng Liu
VLM
33
8
0
10 Mar 2022
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Mitchell Wortsman
Gabriel Ilharco
S. Gadre
Rebecca Roelofs
Raphael Gontijo-Lopes
...
Hongseok Namkoong
Ali Farhadi
Y. Carmon
Simon Kornblith
Ludwig Schmidt
MoMe
24
906
1
10 Mar 2022
Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking
Boyu Chen
Peixia Li
Lei Bai
Leixian Qiao
Qiuhong Shen
Bo-wen Li
Weihao Gan
Wei Wu
Wanli Ouyang
ViT
VOT
20
182
0
10 Mar 2022
StyleBabel: Artistic Style Tagging and Captioning
Dan Ruta
Andrew Gilbert
Pranav Aggarwal
Naveen Marri
Ajinkya Kale
...
Hailin Jin
Baldo Faieta
Alex Filipkowski
Zhe-nan Lin
John Collomosse
15
12
0
10 Mar 2022
MVP: Multimodality-guided Visual Pre-training
Longhui Wei
Lingxi Xie
Wen-gang Zhou
Houqiang Li
Qi Tian
11
104
0
10 Mar 2022
FlexIT: Towards Flexible Semantic Image Translation
Guillaume Couairon
Asya Grechka
Jakob Verbeek
Holger Schwenk
Matthieu Cord
DiffM
31
34
0
09 Mar 2022
A Simple Multi-Modality Transfer Learning Baseline for Sign Language Translation
Yutong Chen
Fangyun Wei
Xiao Sun
Zhirong Wu
Stephen Lin
SLR
17
94
0
08 Mar 2022
Language Matters: A Weakly Supervised Vision-Language Pre-training Approach for Scene Text Detection and Spotting
Chuhui Xue
Wenqing Zhang
Yu Hao
Shijian Lu
Philip H. S. Torr
Song Bai
VLM
27
31
0
08 Mar 2022
HyperPELT: Unified Parameter-Efficient Language Model Tuning for Both Language and Vision-and-Language Tasks
Zhengkun Zhang
Wenya Guo
Xiaojun Meng
Yasheng Wang
Yadao Wang
Xin Jiang
Qun Liu
Zhenglu Yang
26
15
0
08 Mar 2022
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Hao Zhang
Feng Li
Shilong Liu
Lei Zhang
Hang Su
Jun Zhu
L. Ni
H. Shum
ViT
8
1,358
0
07 Mar 2022
Off-Policy Evaluation in Embedded Spaces
Jaron J. R. Lee
David Arbour
Georgios Theocharous
OffRL
14
3
0
05 Mar 2022
Cross Language Image Matching for Weakly Supervised Semantic Segmentation
Jinheng Xie
Xianxu Hou
Kai Ye
Linlin Shen
CLIP
VLM
14
104
0
05 Mar 2022
Generative Adversarial Networks
Gilad Cohen
Raja Giryes
GAN
22
30,040
0
01 Mar 2022
CLIP-GEN: Language-Free Training of a Text-to-Image Generator with CLIP
Zihao W. Wang
Wei Liu
Qian He
Xin-ru Wu
Zili Yi
CLIP
VLM
179
71
0
01 Mar 2022
Multi-modal Alignment using Representation Codebook
Jiali Duan
Liqun Chen
Son Tran
Jinyu Yang
Yi Xu
Belinda Zeng
Trishul M. Chilimbi
17
66
0
28 Feb 2022
SemSup: Semantic Supervision for Simple and Scalable Zero-shot Generalization
Austin W. Hanjie
A. Deshpande
Karthik Narasimhan
VLM
15
2
0
26 Feb 2022
Reconstruction of Perceived Images from fMRI Patterns and Semantic Brain Exploration using Instance-Conditioned GANs
Furkan Ozcelik
Bhavin Choksi
Milad Mozafari
Leila Reddy
Rufin VanRullen
GAN
15
67
0
25 Feb 2022
Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance
Zhuoning Yuan
Yuexin Wu
Zi-qi Qiu
Xianzhi Du
Lijun Zhang
Denny Zhou
Tianbao Yang
19
26
0
24 Feb 2022
StyleCLIPDraw: Coupling Content and Style in Text-to-Drawing Translation
Peter Schaldenbrand
Zhixuan Liu
Jean Oh
CLIP
27
43
0
24 Feb 2022
Indiscriminate Poisoning Attacks on Unsupervised Contrastive Learning
Hao He
Kaiwen Zha
Dina Katabi
AAML
20
31
0
22 Feb 2022
CaMEL: Mean Teacher Learning for Image Captioning
Manuele Barraco
Matteo Stefanini
Marcella Cornia
S. Cascianelli
Lorenzo Baraldi
Rita Cucchiara
ViT
VLM
25
27
0
21 Feb 2022
A Survey of Vision-Language Pre-Trained Models
Yifan Du
Zikang Liu
Junyi Li
Wayne Xin Zhao
VLM
13
177
0
18 Feb 2022
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
79
208
0
18 Feb 2022
Graph Masked Autoencoders with Transformers
Sixiao Zhang
Hongxu Chen
Haoran Yang
Xiangguo Sun
Philip S. Yu
Guandong Xu
8
17
0
17 Feb 2022
CATs++: Boosting Cost Aggregation with Convolutions and Transformers
Seokju Cho
Sunghwan Hong
Seung Wook Kim
ViT
19
34
0
14 Feb 2022
Do Lessons from Metric Learning Generalize to Image-Caption Retrieval?
Maurits J. R. Bleeker
Maarten de Rijke
SSL
DML
19
9
0
14 Feb 2022
Domain Adaptation via Prompt Learning
Chunjiang Ge
Rui Huang
Mixue Xie
Zihang Lai
Shiji Song
Shuang Li
Gao Huang
VPVLM
VLM
23
142
0
14 Feb 2022
Previous
1
2
3
...
163
164
165
166
167
Next