Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 8,849 papers shown
Title
FLAIR: Federated Learning Annotated Image Repository
Congzheng Song
Filip Granqvist
Kunal Talwar
FedML
11
27
0
18 Jul 2022
Zero-Shot Temporal Action Detection via Vision-Language Prompting
Sauradip Nag
Xiatian Zhu
Yi-Zhe Song
Tao Xiang
VLM
20
65
0
17 Jul 2022
X-CLIP: End-to-End Multi-grained Contrastive Learning for Video-Text Retrieval
Yiwei Ma
Guohai Xu
Xiaoshuai Sun
Ming Yan
Ji Zhang
Rongrong Ji
CLIP
VLM
10
266
0
15 Jul 2022
Is one annotation enough? A data-centric image classification benchmark for noisy and ambiguous label estimation
Lars Schmarje
Vasco Grossmann
Claudius Zelenka
S. Dippel
R. Kiko
...
M. Pastell
J. Stracke
A. Valros
N. Volkmann
Reinahrd Koch
31
34
0
13 Jul 2022
Towards Highly Expressive Machine Learning Models of Non-Melanoma Skin Cancer
S. Thomas
J. Lefevre
Glenn W. Baxter
N. Hamilton
MedIm
8
2
0
09 Jul 2022
Transformer Neural Processes: Uncertainty-Aware Meta Learning Via Sequence Modeling
Tung Nguyen
Aditya Grover
BDL
UQCV
19
99
0
09 Jul 2022
Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection
H. Rasheed
Muhammad Maaz
Muhammad Uzair Khattak
Salman Khan
F. Khan
ObjD
VLM
23
151
0
07 Jul 2022
FewSOL: A Dataset for Few-Shot Object Learning in Robotic Environments
P. JishnuJaykumar
Yu-Wei Chao
Yu Xiang
17
11
0
06 Jul 2022
Open-Vocabulary 3D Detection via Image-level Class and Debiased Cross-modal Contrastive Learning
Yuheng Lu
Chenfeng Xu
Xi Wei
Xiaodong Xie
M. Tomizuka
Kurt Keutzer
Shanghang Zhang
3DPC
13
20
0
05 Jul 2022
BiTAT: Neural Network Binarization with Task-dependent Aggregated Transformation
Geondo Park
Jaehong Yoon
H. Zhang
Xingge Zhang
S. Hwang
Yonina C. Eldar
MQ
15
1
0
04 Jul 2022
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
87
93
0
04 Jul 2022
Can Language Understand Depth?
Renrui Zhang
Ziyao Zeng
Ziyu Guo
Yafeng Li
VLM
MDE
13
71
0
03 Jul 2022
Divert More Attention to Vision-Language Tracking
Mingzhe Guo
Zhipeng Zhang
Heng Fan
Li Jing
19
53
0
03 Jul 2022
Chat-to-Design: AI Assisted Personalized Fashion Design
Weiming Zhuang
Chongjie Ye
Ying Xu
Pengzhi Mao
Shuai Zhang
14
1
0
03 Jul 2022
Counterfactually Measuring and Eliminating Social Bias in Vision-Language Pre-training Models
Yi Zhang
Junyan Wang
Jitao Sang
14
27
0
03 Jul 2022
Enabling Harmonious Human-Machine Interaction with Visual-Context Augmented Dialogue System: A Review
Hao Wang
Bin Guo
Y. Zeng
Yasan Ding
Chen Qiu
Ying Zhang
Li Yao
Zhiwen Yu
25
2
0
02 Jul 2022
ReLER@ZJU-Alibaba Submission to the Ego4D Natural Language Queries Challenge 2022
Na Liu
Xiaohan Wang
Xiaobo Li
Yi Yang
Yueting Zhuang
15
18
0
01 Jul 2022
(Un)likelihood Training for Interpretable Embedding
Jiaxin Wu
Chong-Wah Ngo
W. Chan
Zhijian Hou
12
2
0
01 Jul 2022
e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce
Wonyoung Shin
Jonghun Park
Taekang Woo
Yongwoo Cho
Kwangjin Oh
Hwanjun Song
VLM
14
16
0
01 Jul 2022
Measuring Forgetting of Memorized Training Examples
Matthew Jagielski
Om Thakkar
Florian Tramèr
Daphne Ippolito
Katherine Lee
...
Eric Wallace
Shuang Song
Abhradeep Thakurta
Nicolas Papernot
Chiyuan Zhang
TDI
31
102
0
30 Jun 2022
GSCLIP : A Framework for Explaining Distribution Shifts in Natural Language
Zhiying Zhu
Weixin Liang
James Y. Zou
26
9
0
30 Jun 2022
Distilling Model Failures as Directions in Latent Space
Saachi Jain
Hannah Lawrence
Ankur Moitra
A. Madry
16
89
0
29 Jun 2022
LViT: Language meets Vision Transformer in Medical Image Segmentation
Zihan Li
Yunxiang Li
Qingde Li
Puyang Wang
Dazhou Guo
Le Lu
D. Jin
You Zhang
Qingqi Hong
VLM
MedIm
57
131
0
29 Jun 2022
Language-Based Audio Retrieval with Converging Tied Layers and Contrastive Loss
Andrew Koh
Chng Eng Siong
18
1
0
29 Jun 2022
ZSON: Zero-Shot Object-Goal Navigation using Multimodal Goal Embeddings
Arjun Majumdar
Gunjan Aggarwal
Bhavika Devnani
Judy Hoffman
Dhruv Batra
LM&Ro
147
149
0
24 Jun 2022
A Fast Text-Driven Approach for Generating Artistic Content
M. Lupascu
Ryan Murdock
Ionut Mironica
Yijun Li
16
1
0
22 Jun 2022
Open Vocabulary Object Detection with Proposal Mining and Prediction Equalization
Peixian Chen
Kekai Sheng
Mengdan Zhang
Mingbao Lin
Yunhang Shen
Shaohui Lin
Bo Ren
Ke Li
VLM
ObjD
23
27
0
22 Jun 2022
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
...
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
85
1,061
0
22 Jun 2022
Bridging the Gap Between Indexing and Retrieval for Differentiable Search Index with Query Generation
Shengyao Zhuang
Houxing Ren
Linjun Shou
Jian Pei
Ming Gong
Guido Zuccon
Daxin Jiang
25
64
0
21 Jun 2022
GaLeNet: Multimodal Learning for Disaster Prediction, Management and Relief
Rohit Saha
Meng Fang
Angeline Yasodhara
Kyryl Truskovskyi
Azin Asgarian
D. Homola
Raahil Shah
Frederik Dieleman
Jack Weatheritt
Thomas Rogers
19
3
0
18 Jun 2022
Score-Guided Intermediate Layer Optimization: Fast Langevin Mixing for Inverse Problems
Giannis Daras
Y. Dagan
A. Dimakis
C. Daskalakis
BDL
21
15
0
18 Jun 2022
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Y. S. Rawat
M. Shah
SSL
22
130
0
18 Jun 2022
Landscape Learning for Neural Network Inversion
Ruoshi Liu
Chen-Guang Mao
Purva Tendulkar
Hongya Wang
Carl Vondrick
16
8
0
17 Jun 2022
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
Teng Wang
Wenhao Jiang
Zhichao Lu
Feng Zheng
Ran Cheng
Chengguo Yin
Ping Luo
VLM
20
43
0
17 Jun 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
45
391
0
17 Jun 2022
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Linxi Fan
Guanzhi Wang
Yunfan Jiang
Ajay Mandlekar
Yuncong Yang
Haoyi Zhu
Andrew Tang
De-An Huang
Yuke Zhu
Anima Anandkumar
LM&Ro
42
347
0
17 Jun 2022
Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval
Xiao Dong
Xunlin Zhan
Yunchao Wei
Xiaoyong Wei
Yaowei Wang
Minlong Lu
Xiaochun Cao
Xiaodan Liang
19
11
0
17 Jun 2022
Rectify ViT Shortcut Learning by Visual Saliency
Chong Ma
Lin Zhao
Yuzhong Chen
David Liu
Xi Jiang
Tuo Zhang
Xintao Hu
Dinggang Shen
Dajiang Zhu
Tianming Liu
ViT
20
20
0
17 Jun 2022
Rarity Score : A New Metric to Evaluate the Uncommonness of Synthesized Images
Jiyeon Han
Hwanil Choi
Yunjey Choi
Jae Hyun Kim
Jung-Woo Ha
Jaesik Choi
EGVM
10
31
0
17 Jun 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
34
226
0
16 Jun 2022
Patch-level Representation Learning for Self-supervised Vision Transformers
Sukmin Yun
Hankook Lee
Jaehyung Kim
Jinwoo Shin
ViT
16
64
0
16 Jun 2022
Disentangling visual and written concepts in CLIP
Joanna Materzyñska
Antonio Torralba
David Bau
CoGe
12
46
0
15 Jun 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLM
ObjD
17
123
0
15 Jun 2022
A Meta-Analysis of Distributionally-Robust Models
Ben Feuer
Ameya Joshi
C. Hegde
OOD
VLM
17
3
0
15 Jun 2022
Forecasting of depth and ego-motion with transformers and self-supervision
Houssem-eddine Boulahbal
A. Voicila
Andrew I. Comport
ViT
MDE
19
3
0
15 Jun 2022
Zero-shot object goal visual navigation
Qianfan Zhao
Lu Zhang
Bin He
Hong Qiao
Zhi-yong Liu
19
37
0
15 Jun 2022
Differentiable Top-k Classification Learning
Felix Petersen
Hilde Kuehne
Christian Borgelt
Oliver Deussen
41
28
0
15 Jun 2022
Beyond Grounding: Extracting Fine-Grained Event Hierarchies Across Modalities
Hammad A. Ayyubi
Christopher Thomas
Lovish Chum
R. Lokesh
Long Chen
...
Xudong Lin
Xuande Feng
Jaywon Koo
Sounak Ray
Shih-Fu Chang
AI4TS
17
0
0
14 Jun 2022
LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Chung-Ching Lin
Zicheng Liu
Ce Liu
Lijuan Wang
MLLM
VLM
18
81
0
14 Jun 2022
Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Sören Mindermann
J. Brauner
Muhammed Razzak
Mrinank Sharma
Andreas Kirsch
...
Benedikt Höltgen
Aidan N. Gomez
Adrien Morisot
Sebastian Farquhar
Y. Gal
27
148
0
14 Jun 2022
Previous
1
2
3
...
168
169
170
...
175
176
177
Next